## Santa Needs DBS’s Help

Welcome to
DBS Analytics Society

Join at the following url:

Why are we here?

• Because We Love Data – isn’t that right? Yes
• Where to begin -> https://www.kaggle.com/
• The home of data science – create an account
• Pick any problem set and try to solve
• Our brains
• Our hypothesis
• Practice, practice, practice
• Be meticulous

What is a Hackathon?

• https://hackathon.guide/
• Hacking – Creative Problem Solving
• Hackathon – coming together to solve problems
• Parallel track for workshops going forward
• Groups of 2-5 people – dive into the problem
• Positive Energy Only
• Welcome Newcomers
• Learn Something New
• Solve Problems That Interest You
• Imposter Syndrome – You Belong Here – We are all beginners

Good Hackathons

• Clearly Articulated Problem Set
• Attainable
• Easy to Onboard Newcomers
• Led By A Stakeholder
• Organized

Process The Data – what to calculate?

• Take input files and process
• Turn into the dataset form that we can process
• Refactor and improve
• Test
• Search for Data Analytics Tools

Is Programming Required

• In order not to scare you – No
• but then a graphical transformation tools is required because you can’t program
• Realistically when I want to process Big Data – Programming is Required
• Python
• R
• Parallelization for Big Data
• Map Reduce
• In Memory computing / databases

Helping Santa

• https://www.kaggle.com/c/santas-stolen-sleigh
• Read the requirements – calculate weighted reindeer weariness
• Unlimited number of trips
• Head back to to north pole for each trip
• Haversine used to calculate distance –
• Get the data
• sample_submission.csv
• Check out Kaggle solutions / forums / ideas – people help eachother

Excel

• Weight allowed in sleight
• Number of trips required?
• Best Case / Worst Case?
• Sample Solution Case? 5000 Buckets with 20 gifts in each bucket
• How to solve better?

Solving With Python

• Create reindeer.py

``` north_pole = (90,0) weight_limit = 1000 sleigh_weight = 10```

``` import pandas as pd import numpy as np from haversine import distance def weighted_trip_length(stops, weights): tuples = [tuple(x) for x in stops.values] # adding the last trip back to north pole, with just the sleigh weight tuples.append(north_pole) weights.append(sleigh_weight) dist = 0.0 prev_stop = north_pole prev_weight = sum(weights) for i in range(len(tuples)): dist = dist + distance(tuples[i], prev_stop) * prev_weight prev_stop = tuples[i] prev_weight = prev_weight - weights[i] return dist def weighted_reindeer_weariness(all_trips): uniq_trips = all_trips.TripId.unique() if any(all_trips.groupby('TripId').Weight.sum() + sleigh_weight > weight_limit): raise Exception("One of the sleighs over weight limit!") dist = 0.0 for t in uniq_trips: this_trip = all_trips[all_trips.TripId==t] dist = dist + weighted_trip_length(this_trip[['Latitude','Longitude']], this_trip.Weight.tolist()) return dist gifts = pd.read_csv('gifts.csv') sample_sub = pd.read_csv('sample_submission.csv') all_trips = sample_sub.merge(gifts, on='GiftId') ```

```print(weighted_reindeer_weariness(all_trips)) ```
Visualise – See Map Data

Take the gifts.csv file – apply to Fusion Tables

reindeer.py

The Sample Solution Value – 144525525772

Can You Improve on This?