Santa Needs DBS’s Help

Welcome to
DBS Analytics Society

Join at the following url:

https://docs.google.com/forms/d/1oxG7f_jFdXYiDRhOFBroviZpXfTkw_igPpClFF8YOZQ/viewform?usp=send_form

Why are we here?

  • Because We Love Data – isn’t that right? Yes
  • Where to begin -> https://www.kaggle.com/
    • The home of data science – create an account
  • Pick any problem set and try to solve
  • 2 biggest gifts
    • Our brains
    • Our hypothesis
  • Practice, practice, practice
  • Be meticulous

What is a Hackathon?

  • https://hackathon.guide/
  • Hacking – Creative Problem Solving
  • Hackathon – coming together to solve problems
  • Parallel track for workshops going forward
  • Groups of 2-5 people – dive into the problem
  • Positive Energy Only
  • Welcome Newcomers
  • Learn Something New
  • Solve Problems That Interest You
  • Imposter Syndrome – You Belong Here – We are all beginners

Good Hackathons

  • Clearly Articulated Problem Set
  • Attainable
  • Easy to Onboard Newcomers
  • Led By A Stakeholder
  • Organized

Process The Data – what to calculate?

  • Take input files and process
  • Turn into the dataset form that we can process
  • Refactor and improve
  • Test
  • Yes Google Search is your friend – lecturer’s hat removed
  • Search for Data Analytics Tools

Is Programming Required

  • In order not to scare you – No
    • but then a graphical transformation tools is required because you can’t program
  • Realistically when I want to process Big Data – Programming is Required
    • Python
    • R
  • Parallelization for Big Data
    • Multithread
    • Map Reduce
    • In Memory computing / databases

Helping Santa

  • https://www.kaggle.com/c/santas-stolen-sleigh
  • Read the requirements – calculate weighted reindeer weariness
    • Unlimited number of trips
    • Head back to to north pole for each trip
    • Haversine used to calculate distance –
  • Get the data
    • gifts.csv
    • sample_submission.csv
  • Check out Kaggle solutions / forums / ideas – people help eachother

Excel

  • How many gifts?
  • Weight allowed in sleight
  • Number of trips required?
  • Best Case / Worst Case?
  • Sample Solution Case? 5000 Buckets with 20 gifts in each bucket
  • How to solve better?

Solving With Python

  • Create reindeer.py


north_pole = (90,0)
weight_limit = 1000
sleigh_weight = 10

import pandas as pd
import numpy as np
from haversine import distance

def weighted_trip_length(stops, weights):
tuples = [tuple(x) for x in stops.values]
# adding the last trip back to north pole, with just the sleigh weight
tuples.append(north_pole)
weights.append(sleigh_weight)

dist = 0.0
prev_stop = north_pole
prev_weight = sum(weights)
for i in range(len(tuples)):
dist = dist + distance(tuples[i], prev_stop) * prev_weight
prev_stop = tuples[i]
prev_weight = prev_weight - weights[i]
return dist

def weighted_reindeer_weariness(all_trips):
uniq_trips = all_trips.TripId.unique()

if any(all_trips.groupby('TripId').Weight.sum() + sleigh_weight > weight_limit):
raise Exception("One of the sleighs over weight limit!")

dist = 0.0
for t in uniq_trips:
this_trip = all_trips[all_trips.TripId==t]
dist = dist + weighted_trip_length(this_trip[['Latitude','Longitude']], this_trip.Weight.tolist())

return dist

gifts = pd.read_csv('gifts.csv')
sample_sub = pd.read_csv('sample_submission.csv')

all_trips = sample_sub.merge(gifts, on='GiftId')

print(weighted_reindeer_weariness(all_trips))

Visualise – See Map Data

Take the gifts.csv file – apply to Fusion Tables

reindeer.py

The Sample Solution Value – 144525525772

Can You Improve on This?