Book description
Get savvy with R language and actualize projects aimed at analysis, visualization and machine learning
About This Book
 Proficiently analyze data and apply machine learning techniques
 Generate visualizations, develop interactive visualizations and applications to understand various data exploratory functions in R
 Construct a predictive model by using a variety of machine learning packages
Who This Book Is For
This Learning Path is ideal for those who have been exposed to R, but have not used it extensively yet. It covers the basics of using R and is written for new and intermediate R users interested in learning. This Learning Path also provides indepth insights into professional techniques for analysis, visualization, and machine learning with R ? it will help you increase your R expertise, regardless of your level of experience.
What You Will Learn
 Get data into your R environment and prepare it for analysis
 Perform exploratory data analyses and generate meaningful visualizations of the data
 Generate various plots in R using the basic R plotting techniques
 Create presentations and learn the basics of creating apps in R for your audience
 Create and inspect the transaction dataset, performing association analysis with the Apriori algorithm
 Visualize associations in various graph formats and find frequent itemset using the ECLAT algorithm
 Build, tune, and evaluate predictive models with different machine learning packages
 Incorporate R and Hadoop to solve machine learning problems on big data
In Detail
The R language is a powerful, open source, functional programming language. At its core, R is a statistical programming language that provides impressive tools to analyze data and create highlevel graphics. This Learning Path is chockfull of recipes. Literally! It aims to excite you with awesome projects focused on analysis, visualization, and machine learning. We'll start off with data analysis ? this will show you ways to use R to generate professional analysis reports. We'll then move on to visualizing our data ? this provides you with all the guidance needed to get comfortable with data visualization with R. Finally, we'll move into the world of machine learning ? this introduces you to data classification, regression, clustering, association rule mining, and dimension reduction.
This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:
 R Data Analysis Cookbook by Viswa Viswanathan and Shanthi Viswanathan
 R Data Visualization Cookbook by Atmajitsinh Gohil
 Machine Learning with R Cookbook by YuWei, Chiu (David Chiu)
Style and approach
This course creates a smooth learning path that will teach you how to analyze data and create stunning visualizations. The stepbystep instructions provided for each recipe in this comprehensive Learning Path will show you how to create machine learning projects with R.
Publisher resources
Table of contents

R: Recipes for Analysis, Visualization and Machine Learning
 Table of Contents
 R: Recipes for Analysis, Visualization and Machine Learning
 R: Recipes for Analysis, Visualization and Machine Learning
 Credits
 Preface

1. Module 1

1. A Simple Guide to R
 Installing packages and getting help in R
 Data types in R
 Special values in R
 Matrices in R
 Editing a matrix in R
 Data frames in R
 Editing a data frame in R
 Importing data in R
 Exporting data in R
 Writing a function in R
 Writing if else statements in R
 Basic loops in R
 Nested loops in R
 The apply, lapply, sapply, and tapply functions
 Using par to beautify a plot in R
 Saving plots
 2. Practical Machine Learning with R

3. Acquire and Prepare the Ingredients – Your Data
 Introduction
 Reading data from CSV files
 Reading XML data
 Reading JSON data
 Reading data from fixedwidth formatted files
 Reading data from R files and R libraries
 Removing cases with missing values
 Replacing missing values with the mean
 Removing duplicate cases
 Rescaling a variable to [0,1]
 Normalizing or standardizing data in a data frame
 Binning numerical data
 Creating dummies for categorical variables

4. What's in There? – Exploratory Data Analysis
 Introduction
 Creating standard data summaries
 Extracting a subset of a dataset
 Splitting a dataset
 Creating random data partitions
 Generating standard plots such as histograms, boxplots, and scatterplots
 Generating multiple plots on a grid
 Selecting a graphics device
 Creating plots with the lattice package
 Creating plots with the ggplot2 package
 Creating charts that facilitate comparisons
 Creating charts that help visualize a possible causality
 Creating multivariate plots

5. Where Does It Belong? – Classification
 Introduction
 Generating error/classificationconfusion matrices
 Generating ROC charts
 Building, plotting, and evaluating – classification trees
 Using random forest models for classification
 Classifying using Support Vector Machine
 Classifying using the Naïve Bayes approach
 Classifying using the KNN approach
 Using neural networks for classification
 Classifying using linear discriminant function analysis
 Classifying using logistic regression
 Using AdaBoost to combine classification tree models

6. Give Me a Number – Regression
 Introduction
 Computing the root mean squared error
 Building KNN models for regression
 Performing linear regression
 Performing variable selection in linear regression
 Building regression trees
 Building random forest models for regression
 Using neural networks for regression
 Performing kfold crossvalidation
 Performing leaveoneoutcrossvalidation to limit overfitting
 7. Can You Simplify That? – Data Reduction Techniques
 8. Lessons from History – Time Series Analysis

9. It's All About Your Connections – Social Network Analysis
 Introduction
 Downloading social network data using public APIs
 Creating adjacency matrices and edge lists

Plotting social network data
 Getting ready
 How to do it...
 How it works...

There's more...
 Specifying plotting preferences
 Plotting directed graphs
 Creating a graph object with weights
 Extracting the network as an adjacency matrix from the graph object
 Extracting an adjacency matrix with weights
 Extracting edge list from graph object
 Creating bipartite network graph
 Generating projections of a bipartite network
 See also...
 Computing important network metrics
 10. Put Your Best Foot Forward – Document and Present Your Analysis

11. Work Smarter, Not Harder – Efficient and Elegant R Code
 Introduction
 Exploiting vectorized operations
 Processing entire rows or columns using the apply function
 Applying a function to all elements of a collection with lapply and sapply
 Applying functions to subsets of a vector
 Using the splitapplycombine strategy with plyr
 Slicing, dicing, and combining data with data tables

12. Where in the World? – Geospatial Analysis
 Introduction
 Downloading and plotting a Google map of an area
 Overlaying data on the downloaded Google map
 Importing ESRI shape files into R
 Using the sp package to plot geographic data
 Getting maps from the maps package
 Creating spatial data frames from regular data frames containing spatial and other data
 Creating spatial data frames by combining regular data frames with spatial objects
 Adding variables to an existing spatial data frame
 13. Playing Nice – Connecting to Other Systems

1. A Simple Guide to R

2. Module 2

1. Basic and Interactive Plots
 Introduction
 Introducing a scatter plot
 Scatter plots with texts, labels, and lines
 Connecting points in a scatter plot
 Generating an interactive scatter plot
 A simple bar plot
 An interactive bar plot
 A simple line plot
 Line plot to tell an effective story
 Generating an interactive Gantt/timeline chart in R
 Merging histograms
 Making an interactive bubble plot
 Constructing a waterfall plot in R
 2. Heat Maps and Dendrograms
 3. Maps
 4. The Pie Chart and Its Alternatives
 5. Adding the Third Dimension
 6. Data in Higher Dimensions

7. Visualizing Continuous Data
 Introduction
 Generating a candlestick plot
 Generating interactive candlestick plots
 Generating a decomposed time series
 Plotting a regression line
 Constructing a box and whiskers plot
 Generating a violin plot
 Generating a quantilequantile plot (QQ plot)
 Generating a density plot
 Generating a simple correlation plot
 8. Visualizing Text and XKCDstyle Plots
 9. Creating Applications in R

1. Basic and Interactive Plots

3. Module 3

1. Data Exploration with RMS Titanic
 Introduction
 Reading a Titanic dataset from a CSV file
 Converting types on character variables
 Detecting missing values
 Imputing missing values
 Exploring and visualizing data
 Predicting passenger survival with a decision tree
 Validating the power of prediction with a confusion matrix
 Assessing performance with the ROC curve

2. R and Statistics
 Introduction
 Understanding data sampling in R
 Operating a probability distribution in R
 Working with univariate descriptive statistics in R
 Performing correlations and multivariate analysis
 Operating linear regression and multivariate analysis
 Conducting an exact binomial test
 Performing student's ttest
 Performing the KolmogorovSmirnov test
 Understanding the Wilcoxon Rank Sum and Signed Rank test
 Working with Pearson's Chisquared test
 Conducting a oneway ANOVA
 Performing a twoway ANOVA

3. Understanding Regression Analysis
 Introduction
 Fitting a linear regression model with lm
 Summarizing linear model fits
 Using linear regression to predict unknown values
 Generating a diagnostic plot of a fitted model
 Fitting a polynomial regression model with lm
 Fitting a robust linear regression model with rlm
 Studying a case of linear regression on SLID data
 Applying the Gaussian model for generalized linear regression
 Applying the Poisson model for generalized linear regression
 Applying the Binomial model for generalized linear regression
 Fitting a generalized additive model to data
 Visualizing a generalized additive model
 Diagnosing a generalized additive model

4. Classification (I) – Tree, Lazy, and Probabilistic
 Introduction
 Preparing the training and testing datasets
 Building a classification model with recursive partitioning trees
 Visualizing a recursive partitioning tree
 Measuring the prediction performance of a recursive partitioning tree
 Pruning a recursive partitioning tree
 Building a classification model with a conditional inference tree
 Visualizing a conditional inference tree
 Measuring the prediction performance of a conditional inference tree
 Classifying data with the knearest neighbor classifier
 Classifying data with logistic regression
 Classifying data with the Naïve Bayes classifier

5. Classification (II) – Neural Network and SVM
 Introduction
 Classifying data with a support vector machine
 Choosing the cost of a support vector machine
 Visualizing an SVM fit
 Predicting labels based on a model trained by a support vector machine
 Tuning a support vector machine
 Training a neural network with neuralnet
 Visualizing a neural network trained by neuralnet
 Predicting labels based on a model trained by neuralnet
 Training a neural network with nnet
 Predicting labels based on a model trained by nnet

6. Model Evaluation
 Introduction
 Estimating model performance with kfold crossvalidation
 Performing crossvalidation with the e1071 package
 Performing crossvalidation with the caret package
 Ranking the variable importance with the caret package
 Ranking the variable importance with the rminer package
 Finding highly correlated features with the caret package
 Selecting features using the caret package
 Measuring the performance of the regression model
 Measuring prediction performance with a confusion matrix
 Measuring prediction performance using ROCR
 Comparing an ROC curve using the caret package
 Measuring performance differences between models with the caret package

7. Ensemble Learning
 Introduction
 Classifying data with the bagging method
 Performing crossvalidation with the bagging method
 Classifying data with the boosting method
 Performing crossvalidation with the boosting method
 Classifying data with gradient boosting
 Calculating the margins of a classifier
 Calculating the error evolution of the ensemble method
 Classifying data with random forest
 Estimating the prediction errors of different classifiers

8. Clustering
 Introduction
 Clustering data with hierarchical clustering
 Cutting trees into clusters
 Clustering data with the kmeans method
 Drawing a bivariate cluster plot
 Comparing clustering methods
 Extracting silhouette information from clustering
 Obtaining the optimum number of clusters for kmeans
 Clustering data with the densitybased method
 Clustering data with the modelbased method
 Visualizing a dissimilarity matrix
 Validating clusters externally

9. Association Analysis and Sequence Mining
 Introduction
 Transforming data into transactions
 Displaying transactions and associations
 Mining associations with the Apriori rule
 Pruning redundant rules
 Visualizing association rules
 Mining frequent itemsets with Eclat
 Creating transactions with temporal information
 Mining frequent sequential patterns with cSPADE

10. Dimension Reduction
 Introduction
 Performing feature selection with FSelector
 Performing dimension reduction with PCA
 Determining the number of principal components using the scree test
 Determining the number of principal components using the Kaiser method
 Visualizing multivariate data using biplot
 Performing dimension reduction with MDS
 Reducing dimensions with SVD
 Compressing images with SVD
 Performing nonlinear dimension reduction with ISOMAP
 Performing nonlinear dimension reduction with Local Linear Embedding

11. Big Data Analysis (R and Hadoop)
 Introduction
 Preparing the RHadoop environment
 Installing rmr2
 Installing rhdfs
 Operating HDFS with rhdfs
 Implementing a word count problem with RHadoop
 Comparing the performance between an R MapReduce program and a standard R program
 Testing and debugging the rmr2 program
 Installing plyrmr
 Manipulating data with plyrmr
 Conducting machine learning with RHadoop
 Configuring RHadoop clusters on Amazon EMR
 A. Resources for R and Machine Learning
 B. Dataset – Survival of Passengers on the Titanic

1. Data Exploration with RMS Titanic
 A. Bibliography
 Index
Product information
 Title: R: Recipes for Analysis, Visualization and Machine Learning
 Author(s):
 Release date: November 2016
 Publisher(s): Packt Publishing
 ISBN: 9781787289598
You might also like
book
Machine Learning with R  Second Edition
Discover how to build machine learning algorithms, prepare data, and dig deep into data prediction techniques …
book
HandsOn Machine Learning with ScikitLearn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
book
HandsOn Data Science with R
A handson guide for professionals to perform various data science tasks in R Key Features Explore …
book
R Statistics Cookbook
Solve realworld statistical problems using the most popular R packages and techniques Key Features Learn how …