1. 程式人生 > >sklearn.model_selection中train_test_split()函式

sklearn.model_selection中train_test_split()函式

train_test_split()是sklearn.model_selection中的分離器函式,用於將陣列或矩陣劃分為訓練集和測試集,函式樣式為: X_train, X_test, y_train, y_test = train_test_split(train_data, train_target, test_size, random_state,shuffle)

引數解釋:
  • train_data:待劃分的樣本資料
  • train_target:待劃分的對應樣本資料的樣本標籤
  • test_size:1)浮點數,在0 ~ 1之間,表示樣本佔比(test_size = 0.3,則樣本資料中有30%的資料作為測試資料,記入X_test,其餘70%資料記入X_train,同時適用於樣本標籤);2)整數,表示樣本資料中有多少資料記入X_test中,其餘資料記入X_train
  • random_state:隨機數種子,種子不同,每次採的樣本不一樣;種子相同,採的樣本不變(random_state不取,取樣資料不同,但random_state等於某個值,取樣資料相同,取0的時候也相同,這可以自己程式設計嘗試下,不過想改變數值也可以設定random_state = int(time.time()))
  • shuffle:洗牌模式,1)shuffle = False,不打亂樣本資料順序;2)shuffle = True,打亂樣本資料順序
Python程式碼:
>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(30).reshape((10, 3)), range(10)
>>> X_train, X_test ,y_train, y_test= train_test_split(X, y,test_size=0.3, rando
m_state = 20, shuffle=True)
>>> X_train
array([[15, 16, 17],
       [ 0,  1,  2],
       [ 6,  7,  8],
       [18, 19, 20],
       [27, 28, 29],
       [12, 13, 14],
       [ 9, 10, 11]])
>>> X_test
array([[21, 22, 23],
       [ 3,  4,  5],
       [24, 25, 26]])
>>> y_train
[5, 0, 2, 6, 9, 4, 3]
>>> y_test
[7, 1, 8]