**Welcome to**

** DBS Analytics Society**

Join at the following url:

https://docs.google.com/forms/d/1oxG7f_jFdXYiDRhOFBroviZpXfTkw_igPpClFF8YOZQ/viewform?usp=send_form

**Why are we here?**

- Because We Love Data – isn’t that right? Yes
- Where to begin -> https://www.kaggle.com/
- The home of data science – create an account

- Pick any problem set and try to solve
- 2 biggest gifts
- Our brains
- Our hypothesis

- Practice, practice, practice
- Be meticulous

**What is a Hackathon?**

- https://hackathon.guide/
- Hacking – Creative Problem Solving
- Hackathon – coming together to solve problems
- Parallel track for workshops going forward
- Groups of 2-5 people – dive into the problem
- Positive Energy Only
- Welcome Newcomers
- Learn Something New
- Solve Problems That Interest You
- Imposter Syndrome – You Belong Here – We are all beginners

**Good Hackathons**

- Clearly Articulated Problem Set
- Attainable
- Easy to Onboard Newcomers
- Led By A Stakeholder
- Organized

**Process The Data – what to calculate?**

- Take input files and process
- Turn into the dataset form that we can process
- Refactor and improve
- Test

- Process with any tools available
- http://www.computerworld.com/article/2502891/business-intelligence/business-intelligence-8-cool-tools-for-data-analysis-visualization-and-presentation.html

- Yes Google Search is your friend – lecturer’s hat removed
- Search for Data Analytics Tools

**Is Programming Required**

- In order not to scare you – No
- but then a graphical transformation tools is required because you can’t program

- Realistically when I want to process Big Data – Programming is Required
- Python
- R

- Parallelization for Big Data
- Multithread
- Map Reduce
- In Memory computing / databases

**Helping Santa**

- https://www.kaggle.com/c/santas-stolen-sleigh
- Read the requirements – calculate weighted reindeer weariness
- Unlimited number of trips
- Head back to to north pole for each trip
- Haversine used to calculate distance –

- Get the data
- gifts.csv
- sample_submission.csv

- Check out Kaggle solutions / forums / ideas – people help eachother

**Excel**

- How many gifts?
- Weight allowed in sleight
- Number of trips required?
- Best Case / Worst Case?
- Sample Solution Case? 5000 Buckets with 20 gifts in each bucket
- How to solve better?

**Solving With Python**

- https://www.kaggle.com/wendykan/santas-stolen-sleigh/computing-weighted-reindeer-weariness
- Potential solutions for weighted reindeer algorithm
- https://gist.github.com/rochacbruno/2883505
- Simple Haversine in Python
- Usually python 2.7 – install or run if installed
- Python – because I can
- Data Analytics in python using pandas, NumPy, SciPy
- http://pandas-docs.github.io/pandas-docs-travis/install.html#installing-pandas-with-anaconda
- Install with Anaconda – so need to download that
- https://www.continuum.io/downloads
- https://www.continuum.io/downloads#_macosx

- Create reindeer.py

north_pole = (90,0)

weight_limit = 1000

sleigh_weight = 10

```
```import pandas as pd

import numpy as np

from haversine import distance

def weighted_trip_length(stops, weights):

tuples = [tuple(x) for x in stops.values]

# adding the last trip back to north pole, with just the sleigh weight

tuples.append(north_pole)

weights.append(sleigh_weight)

dist = 0.0

prev_stop = north_pole

prev_weight = sum(weights)

for i in range(len(tuples)):

dist = dist + distance(tuples[i], prev_stop) * prev_weight

prev_stop = tuples[i]

prev_weight = prev_weight - weights[i]

return dist

def weighted_reindeer_weariness(all_trips):

uniq_trips = all_trips.TripId.unique()

if any(all_trips.groupby('TripId').Weight.sum() + sleigh_weight > weight_limit):

raise Exception("One of the sleighs over weight limit!")

dist = 0.0

for t in uniq_trips:

this_trip = all_trips[all_trips.TripId==t]

dist = dist + weighted_trip_length(this_trip[['Latitude','Longitude']], this_trip.Weight.tolist())

return dist

gifts = pd.read_csv('gifts.csv')

sample_sub = pd.read_csv('sample_submission.csv')

all_trips = sample_sub.merge(gifts, on='GiftId')

`print(weighted_reindeer_weariness(all_trips))`

**Visualise – See Map Data**

Take the gifts.csv file – apply to Fusion Tables

**reindeer.py**

**The Sample Solution Value – 144525525772**

**Can You Improve on This?**