Visualising Uber Ride Share Data in R

Fivethirtyeight.com have release a dataset to Kaggle that they received as a result of a series of freedom of information requests from the New York Taxi Commission (NYTC) called Uber Pickups in New York City. They want us kagglers to investigate the data and one kaggler Rob Harrand came up with a kernal called Uber-Duper animation. During our DBS Analytics meetup yesterday every single pun on uber was used and we looked and used a few of the kernals with some ideas on how we could improve on them.

uber

This can be generated with the following R-code.

Note image-magick must be installed on your computer to be able to do this.

To install on a mac:

On windows download Image Magick and make sure it is added to the path.

As a programmer immediately I realise that this can be improved – the animated gif being generated is only generating one month of data for 2014, but there are six months of data – so loading in the 6 data sets and binding them into one will allow me to create an image over 6 months – hey I could change the colours as the months change. I can give the x and y axes proper names for Longitude and Latitude. I notice also that the function is skipping every 25000 data points so that out of 1 million data points it is generating 40 images and merging them into 1 animated gif. When all six datasets are merged there are in excess of 4.5 million observations – this is 180 images merged into 1 gif file – with 40 images it was almost 2mb – so this could be a 9mb file. Perhaps I can generate per 250000 – so I could parameterise this offset and so I convert this call to the animation saveGif code into a function called generate uber plot, with parameterised colours for the months, offset to change the number of frames in the animation – thus creating the following animation

uber

Leave a Reply

Your email address will not be published. Required fields are marked *