CA3 – Cluster Analysis and Nearest Neighbour

Find or create a dataset* suitable to K-Means Cluster analysis and K-Nearest Neighbour predictions of roughly 200 observations.

* The dataset should be unique with respect to your class.

Examine the dataset and separate the dataset into a training set of a suitable size and a test set to see the effectiveness of your model.

Follow the tutorials for K-Means Clustering and K-Nearest Neighbour.

Submit you completed work and summary as a classical paper

ADA Lecture 12 – Machine Learning

Apriori is another useful algorithm to understand and be able to use. It is a Data Mining algorithm used in Association Analysis. Often referred to as Basket Analysis, or Shopping Basket Analysis.

An A Priori Algorithm R Example:

Lab 5 – Time Series

R has extensive facilities for analyzing time series data. This section describes the creation of a time series, seasonal decompostion, modeling with exponential and ARIMA models, and forecasting with the forecast package.

See the post on statsmethods for time series here.

Or my personal favourite with excellent data examples from successive Kings of England, to Australian Souvenir Shops, and Hem Sizes on Skirts see A little book of r for Time Series.

R Cheatsheets

R has a lot of apis and plugin libraries which can make it impossible to remember what everything does. The following is a list of Cheat Sheets for R, Python, numpy, scipy, pandas to do regression analysis, machine learning, predictive analytics, and whatever else that might be relevant and interesting:

R Cheat Sheets for Everything

Ref Card in R for Regression Analysis

Ricci Ref Card for Time Series Data

R Cheat Sheets with Quandl File

R ggplot2 Cheat Sheet