Lab 4 – Amazon EMR

Amazon EMR is based on Hadoop, a Java-based programming framework that supports the processing of large data sets in a distributed computing environment. MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.

EC2 (Elastic Compute Cloud) and S3 (Simple Secure Storage) will also be employed in this lab.

Elastic Map Reduce:

Getting Started Tutorial: