Python程式設計入門學習筆記(十)
阿新 • • 發佈:2018-12-25
<h1 style="text-align:center">泰坦尼克資料處理與分析 </h1> ![](http://www.allengao.cn/wp-content/uploads/2018/06/Titanic.jpg) ```python import pandas as pd %matplotlib inline ``` #### 匯入資料 ```python titanic = pd.read_csv('K:/Code/jupyter-notebook/Python Study/train.csv') ``` #### 快速預覽 ```python titanic.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Name</th> <th>Sex</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Ticket</th> <th>Fare</th> <th>Cabin</th> <th>Embarked</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>0</td> <td>3</td> <td>Braund, Mr. Owen Harris</td> <td>male</td> <td>22.0</td> <td>1</td> <td>0</td> <td>A/5 21171</td> <td>7.2500</td> <td>NaN</td> <td>S</td> </tr> <tr> <th>1</th> <td>2</td> <td>1</td> <td>1</td> <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>0</td> <td>PC 17599</td> <td>71.2833</td> <td>C85</td> <td>C</td> </tr> <tr> <th>2</th> <td>3</td> <td>1</td> <td>3</td> <td>Heikkinen, Miss. Laina</td> <td>female</td> <td>26.0</td> <td>0</td> <td>0</td> <td>STON/O2. 3101282</td> <td>7.9250</td> <td>NaN</td> <td>S</td> </tr> <tr> <th>3</th> <td>4</td> <td>1</td> <td>1</td> <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td> <td>female</td> <td>35.0</td> <td>1</td> <td>0</td> <td>113803</td> <td>53.1000</td> <td>C123</td> <td>S</td> </tr> <tr> <th>4</th> <td>5</td> <td>0</td> <td>3</td> <td>Allen, Mr. William Henry</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>373450</td> <td>8.0500</td> <td>NaN</td> <td>S</td> </tr> </tbody> </table> </div> |單詞|翻譯| |---|---| |Passenger|社會階層(1、精英;2、中層;3、船員/勞苦大眾)| |Survived|是否倖存| |name|名字| |sex|性別| |age|年齡| |sibsp|兄弟姐妹配偶個數 sibling spouse| |parch|父母兒女個數| |ticket|船票號| |fare|船票價格| |cabin|船艙| |embarked|登船口| ```python titanic.info() ``` <class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6+ KB ```python # 把所有數值型別的資料做一個簡單的統計 titanic.describe() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Fare</th> </tr> </thead> <tbody> <tr> <th>count</th> <td>891.000000</td> <td>891.000000</td> <td>891.000000</td> <td>714.000000</td> <td>891.000000</td> <td>891.000000</td> <td>891.000000</td> </tr> <tr> <th>mean</th> <td>446.000000</td> <td>0.383838</td> <td>2.308642</td> <td>29.699118</td> <td>0.523008</td> <td>0.381594</td> <td>32.204208</td> </tr> <tr> <th>std</th> <td>257.353842</td> <td>0.486592</td> <td>0.836071</td> <td>14.526497</td> <td>1.102743</td> <td>0.806057</td> <td>49.693429</td> </tr> <tr> <th>min</th> <td>1.000000</td> <td>0.000000</td> <td>1.000000</td> <td>0.420000</td> <td>0.000000</td> <td>0.000000</td> <td>0.000000</td> </tr> <tr> <th>25%</th> <td>223.500000</td> <td>0.000000</td> <td>2.000000</td> <td>20.125000</td> <td>0.000000</td> <td>0.000000</td> <td>7.910400</td> </tr> <tr> <th>50%</th> <td>446.000000</td> <td>0.000000</td> <td>3.000000</td> <td>28.000000</td> <td>0.000000</td> <td>0.000000</td> <td>14.454200</td> </tr> <tr> <th>75%</th> <td>668.500000</td> <td>1.000000</td> <td>3.000000</td> <td>38.000000</td> <td>1.000000</td> <td>0.000000</td> <td>31.000000</td> </tr> <tr> <th>max</th> <td>891.000000</td> <td>1.000000</td> <td>3.000000</td> <td>80.000000</td> <td>8.000000</td> <td>6.000000</td> <td>512.329200</td> </tr> </tbody> </table> </div> ```python # isnull函式統計null值的個數 titanic.isnull().sum() ``` PassengerId 0 Survived 0 Pclass 0 Name 0 Sex 0 Age 177 SibSp 0 Parch 0 Ticket 0 Fare 0 Cabin 687 Embarked 2 dtype: int64 #### 處理空值 ```python # 可以填充整個dataframe裡面的空值,可以取消註釋,試驗一下 #titanic.fillna(0) # 單獨選擇一列進行填充 #titanic.Age.fillna(0) # 求年齡的中位數 titanic.Age.median() #按年齡的中位數進行填充,此時返回一個新的series # titanic.Age.fillna(titanic.Age.median()) #直接填充,並不返回新的series titanic.Age.fillna(titanic.Age.median(),inplace=True) # 在次檢視Age的空值 titanic.isnull().sum() ``` ### 嘗試從性別進行分析 ```python # 做簡單的彙總統計,經常用到 titanic.Sex.value_counts() ``` male 577 female 314 Name: Sex, dtype: int64 ```python # 生還者中,男女的人數 survived = titanic[titanic.Survived==1].Sex.value_counts() ``` ```python # 未生還者中,男女的人數 dead = titanic[titanic.Survived==0].Sex.value_counts() ``` ```python df = pd.DataFrame([survived,dead],index=['survived','dead']) df.plot.bar() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496afd27f0> ![png](output_17_1.png) ```python # 繪圖成功,但不是想要的效果 # 把dataframe轉置一下,行列相互替換 df = df.T df ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>survived</th> <th>dead</th> </tr> </thead> <tbody> <tr> <th>female</th> <td>233</td> <td>81</td> </tr> <tr> <th>male</th> <td>109</td> <td>468</td> </tr> </tbody> </table> </div> ```python df.plot.bar() # df.plot(kind='bar')等價的 ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d1d7940> ![png](output_19_1.png) ```python # 仍然不是我們想要的結果 df.plot(kind = 'bar',stacked = True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d22aef0> ![png](output_20_1.png) ```python # 男女中生還者的比例情況 df['p_survived'] = df.survived / (df.survived + df.dead) df['p_dead'] = df.dead / (df.survived + df.dead) df[['p_survived','p_dead']].plot.bar(stacked=True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d2b7470> ![png](output_21_1.png) #### 通過上面圖片可以看出:性別特徵對是否生還的影響還是挺大的 ### 嘗試從年齡進行分析 ```python # 簡單統計 # titanic.Age.value_counts() ``` ```python survived = titanic[titanic.Survived==1].Age dead = titanic[titanic.Survived==0].Age df =pd.DataFrame([survived,dead],index=['survived','dead']) df = df.T df.plot.hist(stacked=True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496d3c4be0> ![png](output_25_1.png) ```python # 直方圖柱子顯示多一點 df.plot.hist(stacked = True,bins = 30) # 中間很高的柱子,是因為我們把空值都替換為了中位數 ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496e42f588> ![png](output_26_1.png) ```python # 密度圖,更直觀一點 df.plot.kde() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496e4c7dd8> ![png](output_27_1.png) ```python # 可以檢視年齡的分佈,來決定圖片橫軸的取值範圍 titanic.Age.describe() ``` count 891.000000 mean 29.361582 std 13.019697 min 0.420000 25% 22.000000 50% 28.000000 75% 35.000000 max 80.000000 Name: Age, dtype: float64 ```python # 限定範圍 df.plot.kde(xlim=(0,80)) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496e511c18> ![png](output_29_1.png) ```python age = 16 young = titanic[titanic.Age<=age]['Survived'].value_counts() old = titanic[titanic.Age>age]['Survived'].value_counts() df = pd.DataFrame([young,old],index = ['young','old']) df.columns = ['dead','survived'] df.plot.bar(stacked = True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f3a3b70> ![png](output_30_1.png) ```python # 大於16歲和小於等於16歲中生還者的比例情況 df['p_survived'] = df.survived / (df.survived + df.dead) df['p_dead'] = df.dead / (df.survived + df.dead) df[['p_survived','p_dead']].plot.bar(stacked=True) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f407c50> ![png](output_31_1.png) ### 分析票價 ```python # 票價和年齡特徵相似 survived = titanic[titanic.Survived==1].Fare dead = titanic[titanic.Survived==0].Fare df = pd.DataFrame([survived,dead],index = ['survived','dead']) df = df.T df.plot.kde() ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f47b978> ![png](output_33_1.png) ```python # 設定xlim範圍,先檢視票價的範圍 titanic.Fare.describe() ``` count 891.000000 mean 32.204208 std 49.693429 min 0.000000 25% 7.910400 50% 14.454200 75% 31.000000 max 512.329200 Name: Fare, dtype: float64 ```python df.plot(kind = 'kde',xlim = (0,513)) ``` <matplotlib.axes._subplots.AxesSubplot at 0x1496f45bba8> ![png](output_35_1.png) #### 可以看出低票價的人生還率比較低 ### 組合特徵 ```python # 比如同時檢視年齡和票價對生還率的影響 import matplotlib.pyplot as plt plt.scatter(titanic[titanic.Survived==0].Age, titanic[titanic.Survived==0].Fare) ``` <matplotlib.collections.PathCollection at 0x1496f597a58> ![png](output_38_1.png) ```python # 不美觀 ax = plt.subplot() # 未生還者 age = titanic[titanic.Survived==0].Age fare = titanic[titanic.Survived==0].Fare plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='gray') #生還者 age = titanic[titanic.Survived==1].Age fare = titanic[titanic.Survived==1].Fare plt.scatter(age, fare,s=20,alpha=0.3,linewidths=1,edgecolors='red') ax.set_xlabel('age') ax.set_ylabel('fare') ``` Text(0,0.5,'fare') ![png](output_39_1.png) ```python # 生還者 ax = plt.subplot() age = titanic[titanic.Survived==1].Age fare = titanic[titanic.Survived==1].Fare plt.scatter(age, fare,s=20,alpha=0.5,linewidths=1,edgecolors='red') ax.set_xlabel('age') ax.set_ylabel('fare') ``` Text(0,0.5,'fare') ![png](output_40_1.png) ### 隱含特徵 ```python #提取稱呼Mr Mrs Miss titanic.Name ``` 0 Braund, Mr. Owen Harris 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 2 Heikkinen, Miss. Laina 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 Allen, Mr. William Henry 5 Moran, Mr. James 6 McCarthy, Mr. Timothy J 7 Palsson, Master. Gosta Leonard 8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 9 Nasser, Mrs. Nicholas (Adele Achem) 10 Sandstrom, Miss. Marguerite Rut 11 Bonnell, Miss. Elizabeth 12 Saundercock, Mr. William Henry 13 Andersson, Mr. Anders Johan 14 Vestrom, Miss. Hulda Amanda Adolfina 15 Hewlett, Mrs. (Mary D Kingcome) 16 Rice, Master. Eugene 17 Williams, Mr. Charles Eugene 18 Vander Planke, Mrs. Julius (Emelia Maria Vande... 19 Masselmani, Mrs. Fatima 20 Fynney, Mr. Joseph J 21 Beesley, Mr. Lawrence 22 McGowan, Miss. Anna "Annie" 23 Sloper, Mr. William Thompson 24 Palsson, Miss. Torborg Danira 25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... 26 Emir, Mr. Farred Chehab 27 Fortune, Mr. Charles Alexander 28 O'Dwyer, Miss. Ellen "Nellie" 29 Todoroff, Mr. Lalio ... 861 Giles, Mr. Frederick Edward 862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... 863 Sage, Miss. Dorothy Edith "Dolly" 864 Gill, Mr. John William 865 Bystrom, Mrs. (Karolina) 866 Duran y More, Miss. Asuncion 867 Roebling, Mr. Washington Augustus II 868 van Melkebeke, Mr. Philemon 869 Johnson, Master. Harold Theodor 870 Balkic, Mr. Cerin 871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 872 Carlsson, Mr. Frans Olof 873 Vander Cruyssen, Mr. Victor 874 Abelson, Mrs. Samuel (Hannah Wizosky) 875 Najib, Miss. Adele Kiamie "Jane" 876 Gustafsson, Mr. Alfred Ossian 877 Petroff, Mr. Nedelio 878 Laleff, Mr. Kristo 879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 880 Shelley, Mrs. William (Imanita Parrish Hall) 881 Markun, Mr. Johann 882 Dahlberg, Miss. Gerda Ulrika 883 Banfield, Mr. Frederick James 884 Sutehall, Mr. Henry Jr 885 Rice, Mrs. William (Margaret Norton) 886 Montvila, Rev. Juozas 887 Graham, Miss. Margaret Edith 888 Johnston, Miss. Catherine Helen "Carrie" 889 Behr, Mr. Karl Howell 890 Dooley, Mr. Patrick Name: Name, Length: 891, dtype: object ```python titanic['title'] = titanic.Name.apply(lambda name: name.split(',')[1].split('.')[0].strip()) ``` ```python s= 'Williams, Mr.Howard Hugh "harry"' s.split(',')[-1].split('.')[0].strip() ``` 'Mr' ```python titanic.title.value_counts() # 比如有一個人稱呼是Mr,而年齡是不可知的,這個時候可以用所有Mr的年齡平均值來替代, # 而不是用我們之前最簡單的所有資料的中位數。 ``` Mr 517 Miss 182 Mrs 125 Master 40 Dr 7 Rev 6 Mlle 2 Major 2 Col 2 Capt 1 Ms 1 Mme 1 Jonkheer 1 the Countess 1 Don 1 Lady 1 Sir 1 Name: title, dtype: int64 ### GDP ```python ### 夜光圖,簡單用燈光圖的亮度來模擬這個GDP ``` ```python titanic.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Name</th> <th>Sex</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Ticket</th> <th>Fare</th> <th>Cabin</th> <th>Embarked</th> <th>title</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>0</td> <td>3</td> <td>Braund, Mr. Owen Harris</td> <td>male</td> <td>22.0</td> <td>1</td> <td>0</td> <td>A/5 21171</td> <td>7.2500</td> <td>NaN</td> <td>S</td> <td>Mr</td> </tr> <tr> <th>1</th> <td>2</td> <td>1</td> <td>1</td> <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>0</td> <td>PC 17599</td> <td>71.2833</td> <td>C85</td> <td>C</td> <td>Mrs</td> </tr> <tr> <th>2</th> <td>3</td> <td>1</td> <td>3</td> <td>Heikkinen, Miss. Laina</td> <td>female</td> <td>26.0</td> <td>0</td> <td>0</td> <td>STON/O2. 3101282</td> <td>7.9250</td> <td>NaN</td> <td>S</td> <td>Miss</td> </tr> <tr> <th>3</th> <td>4</td> <td>1</td> <td>1</td> <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td> <td>female</td> <td>35.0</td> <td>1</td> <td>0</td> <td>113803</td> <td>53.1000</td> <td>C123</td> <td>S</td> <td>Mrs</td> </tr> <tr> <th>4</th> <td>5</td> <td>0</td> <td>3</td> <td>Allen, Mr. William Henry</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>373450</td> <td>8.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> </tr> </tbody> </table> </div> ```python titanic['family_size'] = titanic.SibSp + titanic.Parch + 1 ``` ```python titanic ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>PassengerId</th> <th>Survived</th> <th>Pclass</th> <th>Name</th> <th>Sex</th> <th>Age</th> <th>SibSp</th> <th>Parch</th> <th>Ticket</th> <th>Fare</th> <th>Cabin</th> <th>Embarked</th> <th>title</th> <th>family_size</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>0</td> <td>3</td> <td>Braund, Mr. Owen Harris</td> <td>male</td> <td>22.0</td> <td>1</td> <td>0</td> <td>A/5 21171</td> <td>7.2500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>2</td> </tr> <tr> <th>1</th> <td>2</td> <td>1</td> <td>1</td> <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>0</td> <td>PC 17599</td> <td>71.2833</td> <td>C85</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>2</th> <td>3</td> <td>1</td> <td>3</td> <td>Heikkinen, Miss. Laina</td> <td>female</td> <td>26.0</td> <td>0</td> <td>0</td> <td>STON/O2. 3101282</td> <td>7.9250</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>3</th> <td>4</td> <td>1</td> <td>1</td> <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td> <td>female</td> <td>35.0</td> <td>1</td> <td>0</td> <td>113803</td> <td>53.1000</td> <td>C123</td> <td>S</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>4</th> <td>5</td> <td>0</td> <td>3</td> <td>Allen, Mr. William Henry</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>373450</td> <td>8.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>5</th> <td>6</td> <td>0</td> <td>3</td> <td>Moran, Mr. James</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>330877</td> <td>8.4583</td> <td>NaN</td> <td>Q</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>6</th> <td>7</td> <td>0</td> <td>1</td> <td>McCarthy, Mr. Timothy J</td> <td>male</td> <td>54.0</td> <td>0</td> <td>0</td> <td>17463</td> <td>51.8625</td> <td>E46</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>7</th> <td>8</td> <td>0</td> <td>3</td> <td>Palsson, Master. Gosta Leonard</td> <td>male</td> <td>2.0</td> <td>3</td> <td>1</td> <td>349909</td> <td>21.0750</td> <td>NaN</td> <td>S</td> <td>Master</td> <td>5</td> </tr> <tr> <th>8</th> <td>9</td> <td>1</td> <td>3</td> <td>Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)</td> <td>female</td> <td>27.0</td> <td>0</td> <td>2</td> <td>347742</td> <td>11.1333</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>3</td> </tr> <tr> <th>9</th> <td>10</td> <td>1</td> <td>2</td> <td>Nasser, Mrs. Nicholas (Adele Achem)</td> <td>female</td> <td>14.0</td> <td>1</td> <td>0</td> <td>237736</td> <td>30.0708</td> <td>NaN</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>10</th> <td>11</td> <td>1</td> <td>3</td> <td>Sandstrom, Miss. Marguerite Rut</td> <td>female</td> <td>4.0</td> <td>1</td> <td>1</td> <td>PP 9549</td> <td>16.7000</td> <td>G6</td> <td>S</td> <td>Miss</td> <td>3</td> </tr> <tr> <th>11</th> <td>12</td> <td>1</td> <td>1</td> <td>Bonnell, Miss. Elizabeth</td> <td>female</td> <td>58.0</td> <td>0</td> <td>0</td> <td>113783</td> <td>26.5500</td> <td>C103</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>12</th> <td>13</td> <td>0</td> <td>3</td> <td>Saundercock, Mr. William Henry</td> <td>male</td> <td>20.0</td> <td>0</td> <td>0</td> <td>A/5. 2151</td> <td>8.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>13</th> <td>14</td> <td>0</td> <td>3</td> <td>Andersson, Mr. Anders Johan</td> <td>male</td> <td>39.0</td> <td>1</td> <td>5</td> <td>347082</td> <td>31.2750</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>7</td> </tr> <tr> <th>14</th> <td>15</td> <td>0</td> <td>3</td> <td>Vestrom, Miss. Hulda Amanda Adolfina</td> <td>female</td> <td>14.0</td> <td>0</td> <td>0</td> <td>350406</td> <td>7.8542</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>15</th> <td>16</td> <td>1</td> <td>2</td> <td>Hewlett, Mrs. (Mary D Kingcome)</td> <td>female</td> <td>55.0</td> <td>0</td> <td>0</td> <td>248706</td> <td>16.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>16</th> <td>17</td> <td>0</td> <td>3</td> <td>Rice, Master. Eugene</td> <td>male</td> <td>2.0</td> <td>4</td> <td>1</td> <td>382652</td> <td>29.1250</td> <td>NaN</td> <td>Q</td> <td>Master</td> <td>6</td> </tr> <tr> <th>17</th> <td>18</td> <td>1</td> <td>2</td> <td>Williams, Mr. Charles Eugene</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>244373</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>18</th> <td>19</td> <td>0</td> <td>3</td> <td>Vander Planke, Mrs. Julius (Emelia Maria Vande...</td> <td>female</td> <td>31.0</td> <td>1</td> <td>0</td> <td>345763</td> <td>18.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>19</th> <td>20</td> <td>1</td> <td>3</td> <td>Masselmani, Mrs. Fatima</td> <td>female</td> <td>28.0</td> <td>0</td> <td>0</td> <td>2649</td> <td>7.2250</td> <td>NaN</td> <td>C</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>20</th> <td>21</td> <td>0</td> <td>2</td> <td>Fynney, Mr. Joseph J</td> <td>male</td> <td>35.0</td> <td>0</td> <td>0</td> <td>239865</td> <td>26.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>21</th> <td>22</td> <td>1</td> <td>2</td> <td>Beesley, Mr. Lawrence</td> <td>male</td> <td>34.0</td> <td>0</td> <td>0</td> <td>248698</td> <td>13.0000</td> <td>D56</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>22</th> <td>23</td> <td>1</td> <td>3</td> <td>McGowan, Miss. Anna "Annie"</td> <td>female</td> <td>15.0</td> <td>0</td> <td>0</td> <td>330923</td> <td>8.0292</td> <td>NaN</td> <td>Q</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>23</th> <td>24</td> <td>1</td> <td>1</td> <td>Sloper, Mr. William Thompson</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>113788</td> <td>35.5000</td> <td>A6</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>24</th> <td>25</td> <td>0</td> <td>3</td> <td>Palsson, Miss. Torborg Danira</td> <td>female</td> <td>8.0</td> <td>3</td> <td>1</td> <td>349909</td> <td>21.0750</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>5</td> </tr> <tr> <th>25</th> <td>26</td> <td>1</td> <td>3</td> <td>Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...</td> <td>female</td> <td>38.0</td> <td>1</td> <td>5</td> <td>347077</td> <td>31.3875</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>7</td> </tr> <tr> <th>26</th> <td>27</td> <td>0</td> <td>3</td> <td>Emir, Mr. Farred Chehab</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>2631</td> <td>7.2250</td> <td>NaN</td> <td>C</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>27</th> <td>28</td> <td>0</td> <td>1</td> <td>Fortune, Mr. Charles Alexander</td> <td>male</td> <td>19.0</td> <td>3</td> <td>2</td> <td>19950</td> <td>263.0000</td> <td>C23 C25 C27</td> <td>S</td> <td>Mr</td> <td>6</td> </tr> <tr> <th>28</th> <td>29</td> <td>1</td> <td>3</td> <td>O'Dwyer, Miss. Ellen "Nellie"</td> <td>female</td> <td>28.0</td> <td>0</td> <td>0</td> <td>330959</td> <td>7.8792</td> <td>NaN</td> <td>Q</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>29</th> <td>30</td> <td>0</td> <td>3</td> <td>Todoroff, Mr. Lalio</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>349216</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>861</th> <td>862</td> <td>0</td> <td>2</td> <td>Giles, Mr. Frederick Edward</td> <td>male</td> <td>21.0</td> <td>1</td> <td>0</td> <td>28134</td> <td>11.5000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>2</td> </tr> <tr> <th>862</th> <td>863</td> <td>1</td> <td>1</td> <td>Swift, Mrs. Frederick Joel (Margaret Welles Ba...</td> <td>female</td> <td>48.0</td> <td>0</td> <td>0</td> <td>17466</td> <td>25.9292</td> <td>D17</td> <td>S</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>863</th> <td>864</td> <td>0</td> <td>3</td> <td>Sage, Miss. Dorothy Edith "Dolly"</td> <td>female</td> <td>28.0</td> <td>8</td> <td>2</td> <td>CA. 2343</td> <td>69.5500</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>11</td> </tr> <tr> <th>864</th> <td>865</td> <td>0</td> <td>2</td> <td>Gill, Mr. John William</td> <td>male</td> <td>24.0</td> <td>0</td> <td>0</td> <td>233866</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>865</th> <td>866</td> <td>1</td> <td>2</td> <td>Bystrom, Mrs. (Karolina)</td> <td>female</td> <td>42.0</td> <td>0</td> <td>0</td> <td>236852</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>1</td> </tr> <tr> <th>866</th> <td>867</td> <td>1</td> <td>2</td> <td>Duran y More, Miss. Asuncion</td> <td>female</td> <td>27.0</td> <td>1</td> <td>0</td> <td>SC/PARIS 2149</td> <td>13.8583</td> <td>NaN</td> <td>C</td> <td>Miss</td> <td>2</td> </tr> <tr> <th>867</th> <td>868</td> <td>0</td> <td>1</td> <td>Roebling, Mr. Washington Augustus II</td> <td>male</td> <td>31.0</td> <td>0</td> <td>0</td> <td>PC 17590</td> <td>50.4958</td> <td>A24</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>868</th> <td>869</td> <td>0</td> <td>3</td> <td>van Melkebeke, Mr. Philemon</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>345777</td> <td>9.5000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>869</th> <td>870</td> <td>1</td> <td>3</td> <td>Johnson, Master. Harold Theodor</td> <td>male</td> <td>4.0</td> <td>1</td> <td>1</td> <td>347742</td> <td>11.1333</td> <td>NaN</td> <td>S</td> <td>Master</td> <td>3</td> </tr> <tr> <th>870</th> <td>871</td> <td>0</td> <td>3</td> <td>Balkic, Mr. Cerin</td> <td>male</td> <td>26.0</td> <td>0</td> <td>0</td> <td>349248</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>871</th> <td>872</td> <td>1</td> <td>1</td> <td>Beckwith, Mrs. Richard Leonard (Sallie Monypeny)</td> <td>female</td> <td>47.0</td> <td>1</td> <td>1</td> <td>11751</td> <td>52.5542</td> <td>D35</td> <td>S</td> <td>Mrs</td> <td>3</td> </tr> <tr> <th>872</th> <td>873</td> <td>0</td> <td>1</td> <td>Carlsson, Mr. Frans Olof</td> <td>male</td> <td>33.0</td> <td>0</td> <td>0</td> <td>695</td> <td>5.0000</td> <td>B51 B53 B55</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>873</th> <td>874</td> <td>0</td> <td>3</td> <td>Vander Cruyssen, Mr. Victor</td> <td>male</td> <td>47.0</td> <td>0</td> <td>0</td> <td>345765</td> <td>9.0000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>874</th> <td>875</td> <td>1</td> <td>2</td> <td>Abelson, Mrs. Samuel (Hannah Wizosky)</td> <td>female</td> <td>28.0</td> <td>1</td> <td>0</td> <td>P/PP 3381</td> <td>24.0000</td> <td>NaN</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>875</th> <td>876</td> <td>1</td> <td>3</td> <td>Najib, Miss. Adele Kiamie "Jane"</td> <td>female</td> <td>15.0</td> <td>0</td> <td>0</td> <td>2667</td> <td>7.2250</td> <td>NaN</td> <td>C</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>876</th> <td>877</td> <td>0</td> <td>3</td> <td>Gustafsson, Mr. Alfred Ossian</td> <td>male</td> <td>20.0</td> <td>0</td> <td>0</td> <td>7534</td> <td>9.8458</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>877</th> <td>878</td> <td>0</td> <td>3</td> <td>Petroff, Mr. Nedelio</td> <td>male</td> <td>19.0</td> <td>0</td> <td>0</td> <td>349212</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>878</th> <td>879</td> <td>0</td> <td>3</td> <td>Laleff, Mr. Kristo</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>349217</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>879</th> <td>880</td> <td>1</td> <td>1</td> <td>Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)</td> <td>female</td> <td>56.0</td> <td>0</td> <td>1</td> <td>11767</td> <td>83.1583</td> <td>C50</td> <td>C</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>880</th> <td>881</td> <td>1</td> <td>2</td> <td>Shelley, Mrs. William (Imanita Parrish Hall)</td> <td>female</td> <td>25.0</td> <td>0</td> <td>1</td> <td>230433</td> <td>26.0000</td> <td>NaN</td> <td>S</td> <td>Mrs</td> <td>2</td> </tr> <tr> <th>881</th> <td>882</td> <td>0</td> <td>3</td> <td>Markun, Mr. Johann</td> <td>male</td> <td>33.0</td> <td>0</td> <td>0</td> <td>349257</td> <td>7.8958</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>882</th> <td>883</td> <td>0</td> <td>3</td> <td>Dahlberg, Miss. Gerda Ulrika</td> <td>female</td> <td>22.0</td> <td>0</td> <td>0</td> <td>7552</td> <td>10.5167</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>883</th> <td>884</td> <td>0</td> <td>2</td> <td>Banfield, Mr. Frederick James</td> <td>male</td> <td>28.0</td> <td>0</td> <td>0</td> <td>C.A./SOTON 34068</td> <td>10.5000</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>884</th> <td>885</td> <td>0</td> <td>3</td> <td>Sutehall, Mr. Henry Jr</td> <td>male</td> <td>25.0</td> <td>0</td> <td>0</td> <td>SOTON/OQ 392076</td> <td>7.0500</td> <td>NaN</td> <td>S</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>885</th> <td>886</td> <td>0</td> <td>3</td> <td>Rice, Mrs. William (Margaret Norton)</td> <td>female</td> <td>39.0</td> <td>0</td> <td>5</td> <td>382652</td> <td>29.1250</td> <td>NaN</td> <td>Q</td> <td>Mrs</td> <td>6</td> </tr> <tr> <th>886</th> <td>887</td> <td>0</td> <td>2</td> <td>Montvila, Rev. Juozas</td> <td>male</td> <td>27.0</td> <td>0</td> <td>0</td> <td>211536</td> <td>13.0000</td> <td>NaN</td> <td>S</td> <td>Rev</td> <td>1</td> </tr> <tr> <th>887</th> <td>888</td> <td>1</td> <td>1</td> <td>Graham, Miss. Margaret Edith</td> <td>female</td> <td>19.0</td> <td>0</td> <td>0</td> <td>112053</td> <td>30.0000</td> <td>B42</td> <td>S</td> <td>Miss</td> <td>1</td> </tr> <tr> <th>888</th> <td>889</td> <td>0</td> <td>3</td> <td>Johnston, Miss. Catherine Helen "Carrie"</td> <td>female</td> <td>28.0</td> <td>1</td> <td>2</td> <td>W./C. 6607</td> <td>23.4500</td> <td>NaN</td> <td>S</td> <td>Miss</td> <td>4</td> </tr> <tr> <th>889</th> <td>890</td> <td>1</td> <td>1</td> <td>Behr, Mr. Karl Howell</td> <td>male</td> <td>26.0</td> <td>0</td> <td>0</td> <td>111369</td> <td>30.0000</td> <td>C148</td> <td>C</td> <td>Mr</td> <td>1</td> </tr> <tr> <th>890</th> <td>891</td> <td>0</td> <td>3</td> <td>Dooley, Mr. Patrick</td> <td>male</td> <td>32.0</td> <td>0</td> <td>0</td> <td>370376</td> <td>7.7500</td> <td>NaN</td> <td>Q</td> <td>Mr</td> <td>1</td> </tr> </tbody> </table> <p>891 rows × 14 columns</p> </div> ```python titanic.family_size.value_counts() ``` 1 537 2 161 3 102 4 29 6 22 5 15 7 12 11 7 8 6 Name: family_size, dtype: int64 ```python def func(family_size): if family_size == 1: return 'Singleton' if family_size <= 4 and family_size >= 2: return 'SmallFamily' if family_size > 4: return 'LargeFamily' titanic['family_type'] = titanic.family_size.apply(func) ``` ```python titanic.family_type.value_counts() ``` Singleton 537 SmallFamily 292 LargeFamily 62 Name: family_type, dtype: int64