“Faster, Higher, Stronger” — ML on Olympics

阿新 • • 發佈：2018-12-28

Data

The dataset has 15 attributes with 271k samples, which is the largest dataset I will be studying on Medium. I will be doing some gold digging as the main purpose of this study, with given information related to whether athletes win the medal or not.

Attributes:

ID    -> Each athlete's unique number Name  -> 134732 Unique ValuesSex AgeHeight WeightTeamNOC    -> National Olympic CommitteeGames  -> Year and seasonYear   -> 1896 - 2016Season -> Summer or WinterCitySportEventMedal  -> Gold, Silver, Bronze, or NA

Since the dataset has 271k rows, it is better to check missing values.

print(df.isnull().sum())

Here it can be seen that; Age, Height and Weight columns have missing values. These columns have immense importance on getting accurate results. So columns cannot be removed as a column, they need to be replaced.

Preprocessing

Firstly, a datum is taken in the DataFrame structure of pandas;

import pandas as pdolympics_csv = pd.read_csv('athlete_events.csv')df = pd.DataFrame(olympics_csv)

For replacing missing values, Age, Weight and Height should be filled appropriately. Also, many columns need to be numbered which are; Name, Sex, Team, NOC, Games, Season, City, Sport and Event.

Medal values numbered by pandas library’s features;

df['Medal']  = df.groupby(['Medal']).ngroup()

If an athlete wins:

Gold -> 1
Silver ->2
Bronze ->3
Loses -> -1

Weight, Age and Height replaced with mean value of the each column;

df['Weight'] = df['Weight'].fillna(df['Weight'].mean().astype(int))df['Height'] = df['Height'].fillna(df['Height'].mean().astype(int))

df['Age'] = df['Age'].fillna(df['Age'].mean().astype(int))

For other columns;

df['Name']   = df.groupby(['Name']).ngroup()df['Sex']    = df.groupby(['Sex']).ngroup()df['Team']   = df.groupby(['Team']).ngroup()df['NOC']    = df.groupby(['NOC']).ngroup()df['Games']  = df.groupby(['Games']).ngroup()df['Season'] = df.groupby(['Season']).ngroup()df['City']   = df.groupby(['City']).ngroup()df['Sport']  = df.groupby(['Sport']).ngroup()df['Event']  = df.groupby(['Event']).ngroup()

After operations, all columns are filled and numbered except the Age column;

“Faster, Higher, Stronger” — ML on Olympics

Data

Preprocessing

“Faster, Higher, Stronger” — ML on Olympics

論文筆記（1）--（YOLOv2）YOLO9000：Better，Faster，Stronger

報異常：AnnotationAwareAspectJAutoProxyCreator is only available on Java 1.5 and higher

《YOLO9000: Better, Faster, Stronger》論文筆記

【學習筆記】Hands-on ML with sklearn&tensorflow [TF] [2]placeholder nodes實現mini-batch

【學習筆記】Hands-on ML with sklearn&tensorflow [TF] [1]模型的訓練、儲存和載入

Context namespace element 'component-scan' its parser class are only available on JDK 1.5 and higher

YOLO9000, Better, Faster, Stronger論文翻譯——中英文對照這個總結的最好

ML baesed on TF 1

「Computer Vision」Note on Faster Training of Mask R-CNN

「Medical Image Analysis」Note on Combining Faster R-CNN and U-net Network

AnnotationTransactionAttributeSource is only available on Java 1.5 and higher

Jeff Nuckolls on LinkedIn: "#AWS #ML #IOT #deeplearning #machinelearning"

How to Use Local Keywords to Rank Higher on Google

Accelerate model training using faster Pipe mode on Amazon SageMaker

Ask HN: Are any teams working on ML based code generation?

Ask HN: Is anyone working on ML based code/program generation?

YOLO9000: Better, Faster, Stronger論文閱讀

【筆記】YOLO9000: Better, Faster, Stronger

YOLO9000:Better, Faster, Stronger

“Faster, Higher, Stronger” — ML on Olympics

Data

Preprocessing

相關推薦