Johns Hopkins R Programming Course introduces SWIRL

When reviewing Coursera’s excellent Johns Hopkins R Programming course this evening delivered by Roger D. Peng it introduced me to swirl.

What I hear you ask is swirl? Well it is an interactive R tutorial which can be run from R or R Studio.

Previously I’ve been getting my students to run the Try R interactive tutorial from codeschool but I think from now on it will be swirl.

To enter into the swirl interactive R tutorial – open up R or R Studio and type the following and enjoy the hours spent practicing R:

I must say I am really impressed with this course and it only costs $43 per month – I managed to get through the first 3 weeks of lectures, quizzes, and practicals this evening. This is the second of nine courses in this specialisation. I am so impressed that I bought Professor Peng’s books, course notes, videos, datasets. If there were t-shirts, I would have bought one too.

Visualising Uber Ride Share Data in R

Fivethirtyeight.com have release a dataset to Kaggle that they received as a result of a series of freedom of information requests from the New York Taxi Commission (NYTC) called Uber Pickups in New York City. They want us kagglers to investigate the data and one kaggler Rob Harrand came up with a kernal called Uber-Duper animation. During our DBS Analytics meetup yesterday every single pun on uber was used and we looked and used a few of the kernals with some ideas on how we could improve on them.

uber

This can be generated with the following R-code.

Note image-magick must be installed on your computer to be able to do this.

To install on a mac:

On windows download Image Magick and make sure it is added to the path.

As a programmer immediately I realise that this can be improved – the animated gif being generated is only generating one month of data for 2014, but there are six months of data – so loading in the 6 data sets and binding them into one will allow me to create an image over 6 months – hey I could change the colours as the months change. I can give the x and y axes proper names for Longitude and Latitude. I notice also that the function is skipping every 25000 data points so that out of 1 million data points it is generating 40 images and merging them into 1 animated gif. When all six datasets are merged there are in excess of 4.5 million observations – this is 180 images merged into 1 gif file – with 40 images it was almost 2mb – so this could be a 9mb file. Perhaps I can generate per 250000 – so I could parameterise this offset and so I convert this call to the animation saveGif code into a function called generate uber plot, with parameterised colours for the months, offset to change the number of frames in the animation – thus creating the following animation

uber

ADA Lecture 12 – Machine Learning

Apriori is another useful algorithm to understand and be able to use. It is a Data Mining algorithm used in Association Analysis. Often referred to as Basket Analysis, or Shopping Basket Analysis.

An A Priori Algorithm R Example:

Lab 5 – Time Series

R has extensive facilities for analyzing time series data. This section describes the creation of a time series, seasonal decompostion, modeling with exponential and ARIMA models, and forecasting with the forecast package.

See the post on statsmethods for time series here.

Or my personal favourite with excellent data examples from successive Kings of England, to Australian Souvenir Shops, and Hem Sizes on Skirts see A little book of r for Time Series.

R Cheatsheets

R has a lot of apis and plugin libraries which can make it impossible to remember what everything does. The following is a list of Cheat Sheets for R, Python, numpy, scipy, pandas to do regression analysis, machine learning, predictive analytics, and whatever else that might be relevant and interesting:

R Cheat Sheets for Everything

Ref Card in R for Regression Analysis

Ricci Ref Card for Time Series Data

R Cheat Sheets with Quandl File

R ggplot2 Cheat Sheet