1. 程式人生 > >使用sklearn中的方法進行資料劃分

使用sklearn中的方法進行資料劃分

train_test_split的引數
test_size : float, int, None, optional
    If float, should be between 0.0 and 1.0 and represent the proportion
    of the dataset to include in the test split. If int, represents the
    absolute number of test samples. If None, the value is set to the
    complement of the train size. By default, the value is set to 0.25.
    The default will change in version 0.21. It will remain 0.25 only
    if ``train_size`` is unspecified, otherwise it will complement
    the specified ``train_size``.

train_size : float, int, or None, default None
    If float, should be between 0.0 and 1.0 and represent the
    proportion of the dataset to include in the train split. If
    int, represents the absolute number of train samples. If None,
    the value is automatically set to the complement of the test size.

random_state : int, RandomState instance or None, optional (default=None)
    If int, random_state is the seed used by the random number generator;
    If RandomState instance, random_state is the random number generator;
    If None, the random number generator is the RandomState instance used
    by `np.random`.

shuffle : boolean, optional (default=True)
    Whether or not to shuffle the data before splitting. If shuffle=False
    then stratify must be None.

stratify : array-like or None (default is None)
    If not None, data is split in a stratified fashion, using this as
    the class labels.

 

from sklearn.model_selection import train_test_split
from sklearn import datasets
iris=datasets.load_iris()#鳶尾花資料
X=iris.data                  
y=iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)#資料按%80,%20劃分
X_train.shape,y_train.shape,X_test.shape,y_test.shape   #得到訓練集和測試集的大小