Below are the links to the data files used in this lecture:
R is fundamental to data scientists and to the budding person new to this field the overriding question is ‘where do I start?’
The following are links to R tutorial sites that I have found to be invaluable to teach students new to R every year:
In 2010 Springer published the book a Guide to Intelligent Data Analysis with some excellent topics and features, such as:
- Guides the reader through the process of data analysis, following the interdependent steps of project understanding, data understanding, data preparation, modeling, and deployment and monitoring
- Equips the reader with the necessary information in order to obtain hands-on experience of the topics under discussion
- Provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms
- Includes numerous examples using R and KNIME, together with appendices introducing the open source software
- Integrates illustrations and case-study-style examples to support pedagogical exposition
- Supplies further tools and information at the associated website: http://www.idaguide.net/
Earl F Flynn in 2007 published Using Colors in R for the Stowers Institute for Medical Research and it is the baseline to know for all data visualisations in R that you will ever need.
CA 1 – Anscombe’s Quartet
Write up a report and analysis on the Anscombe’s Quartet.
Describe the flaws that this data set exposes with just looking at Pearson’s Correlation independent of visualising the data.
Describe each of the 4 charts in a blog post roughly 500 words in length.
In your answer also come up with a new set of data points which validate Anscombe’s work.