2013

10:16 hours

Lesson 1 Getting Started with R: R can only be used after installation, which fortunately is just as simple as installing any other program. In this lesson you learn about where to download R, how to decide on the best version, how to install it and you get familiar with its environment, using RStudio as a front end. We also take a look at the package system.

Lesson 2 The Basic Building Blocks in R: R is a flexible and robust programming language and using it requires understanding how it handles data. We learn about performing basic math in R, storing various types of data in variables—such as numeric, integer, character and time-based—and calling functions on the data.

Lesson 3 Advanced Data Structures in R: Like many other languages, R offers more complex storage mechanisms such as vectors, arrays, matrices and lists. We take a look at those, and the data.frame, a special storage type that strongly resembles a spreadsheet and is part of what makes working with data in R such a pleasure.

Lesson 4 Reading Data into R: Data is abundant in the world, so analyzing it is just a matter of getting the data into R. There are many ways of doing so, the most common being reading from a CSV or database. We cover these and also importing from other statistical tools, and scraping websites.

Lesson 5 Making Statistical Graphs: Visualizing data is a crucial part of data science both in the discovery phase and when reporting results. R has long been known for its capability to produce compelling plots, and Hadley Wickham’s ggplot2 package makes it even easier to produce better looking graphics. We cover histograms, boxplots, scatterplots, line charts and more.

Lesson 6 Basics of Programming: R has all the standard components of a programming language such as writing functions, if statements and loops, all with their own caveats and quirks. We start with the requisite “Hello, World!’ function and learn about arguments to functions, the regular if statement and the vectorized version, and how to build loops and why they should be avoided.

Lesson 7 Data Munging: Data scientists often bemoan that 80% of their work is manipulating data. As such, R has many tools for this, which are, contrary to what Python users may say, easy to use. We see how R excels at group operations using apply, lapply and the plyr package. We also take a look at its facilities for joining, combing and rearranging data.

Lesson 8 Manipulating Strings: Text data is becoming more pervasive in the world, and fortunately, R provides ways for both combing text and ripping it apart, which we walk through. We also examine R’s extensive regular expression capabilities.

Lesson 9 Basic Statistics: Naturally, R has all the basics when it comes to statistics such as means, variance, correlation, t-tests and anovas. We look at all the different ways those can be computed.

Lesson 10 Linear Models: The workhorse of statistics is regression and its extensions. This consists of linear models, generalized linear models–including logistic and Poisson regression–and survival models. We look at how to fit these models in R and how to evaluate them using measures such as mean squared error, deviance and AIC.

Lesson 11 Other Models: Beyond regression there are many other types of models that can be fit to data. Models covered include regularization with the elastic net, bayesian shrinkage, nonlinear models such as nonlinear least squares, splines and generalized additive models, decision tress and random forests.

Lesson 12 Time Series: Special care must be taken with data where there is time based correlation, otherwise known as autocorrelation. We look at some common methods for dealing with time series such as ARIMA, VAR and GARCH.

Lesson 13 Clustering: A focal point of modern machine learning is clustering, the partitioning of data into groups. We explore three popular methods: K-means, K-medoids and hierarchical clustering.

Lesson 14 Reports and Slideshows with knitr: Successfully delivering the results of an analysis can be just as important as the analysis itself, so it is important to communicate them in an effective way. This communication can take the form of a written report, a Web site of results, a slide show or a dashboard. In this lesson we focus on the first three, which are made remarkably easy using knitr, a package written by Yihui Xie.

Lesson 15 Package Building: Building packages is a great way to contribute back to the R community and doing so has never been easier thanks to Hadley Wickham’s devtools package. This lesson covers all the requirements for a package and how to go about authoring and distributing them.