使用sklearn中的方法進行資料劃分

阿新 • • 發佈：2018-11-24

train_test_split的引數
test_size : float, int, None, optional
    If float, should be between 0.0 and 1.0 and represent the proportion
    of the dataset to include in the test split. If int, represents the
    absolute number of test samples. If None, the value is set to the
    complement of the train size. By default, the value is set to 0.25.
    The default will change in version 0.21. It will remain 0.25 only
    if ``train_size`` is unspecified, otherwise it will complement
    the specified ``train_size``.

train_size : float, int, or None, default None
    If float, should be between 0.0 and 1.0 and represent the
    proportion of the dataset to include in the train split. If
    int, represents the absolute number of train samples. If None,
    the value is automatically set to the complement of the test size.

random_state : int, RandomState instance or None, optional (default=None)
    If int, random_state is the seed used by the random number generator;
    If RandomState instance, random_state is the random number generator;
    If None, the random number generator is the RandomState instance used
    by `np.random`.

shuffle : boolean, optional (default=True)
Whether or not to shuffle the data before splitting. If shuffle=False
then stratify must be None.

stratify : array-like or None (default is None)
If not None, data is split in a stratified fashion, using this as
the class labels.

from sklearn.model_selection import train_test_split
from sklearn import datasets
iris=datasets.load_iris()#鳶尾花資料
X=iris.data                  
y=iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)#資料按%80，%20劃分
X_train.shape,y_train.shape,X_test.shape,y_test.shape   #得到訓練集和測試集的大小

使用sklearn中的方法進行資料劃分

使用sklearn中的方法進行資料劃分

Sklearn.cross_validation模組和資料劃分方法

簡單操作sklearn中內建資料

JavaWeb中利用ModelAndView 和SpringMVC中結合進行資料渲染

Struts2：對Action中方法進行輸入校驗

【Java】——list中快速進行資料篩選

sklearn 中的preprocessing資料預處理

機器學習入門-載入sklearn中資料並用matplotlib進行視覺化

sklearn中自定義轉換器以及使用流水線對資料據進行處理

利用python中的pandas，sklearn進行資料探勘 basic_of_datamining

EasyUI中使用datagrid元件的insertRow方法進行前端插入JSON資料問題

sklearn中的交叉驗證和資料劃分

sklearn中樹模型可視化的方法

MySql中啟用InnoDB資料引擎的方法

利用sklearn 中的線性迴歸模型訓練資料使用到的庫有numpy pandas matplotlib

MFC中寫入臨時資料夾中的方法

JMeter中返回Json資料的處理方法

Excel中如何對資料進行簡單排序

批量更新程式碼整理（程式碼庫）從陣列中批量取資料的方法

django中常用的資料查詢方法

使用sklearn中的方法進行資料劃分

相關推薦