R is fundamental to data scientists and to the budding person new to this field the overriding question is ‘where do I start?’

The following are links to R tutorial sites that I have found to be invaluable to teach students new to R every year:

In 2010 Springer published the book a Guide to Intelligent Data Analysis with some excellent topics and features, such as:

  • Guides the reader through the process of data analysis, following the interdependent steps of project understanding, data understanding, data preparation, modeling, and deployment and monitoring
  • Equips the reader with the necessary information in order to obtain hands-on experience of the topics under discussion
  • Provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms
  • Includes numerous examples using R and KNIME, together with appendices introducing the open source software
  • Integrates illustrations and case-study-style examples to support pedagogical exposition
  • Supplies further tools and information at the associated website: http://www.idaguide.net/

Data Quality for Data Mining

The following is an excellent presentation by Theodore Johnson from AT&T that he delivered at Rutgers University in 2004 – still as relevant today as it was on the 12th February of that year. The original presentation can be found for download here.

CA1 – Anscombe’s Quartet

CA 1 – Anscombe’s Quartet

Write up a report and analysis on the Anscombe’s Quartet.

Describe the flaws that this data set exposes with just looking at Pearson’s Correlation independent of visualising the data.

Describe each of the 4 charts in a blog post roughly 500 words in length.

In your answer also come up with a new set of data points which validate Anscombe’s work.