1. 程式人生 > >R語言——K折交叉驗證之隨機均分數據集

R語言——K折交叉驗證之隨機均分數據集

present sent new 理解 6.5 ble 數據表 uno repr

今天,在閱讀吳喜之教授的《復雜數據統計方法》時,遇到了把一個數據集按照某個因子分成若幹子集,再把若幹子集隨機平均分成n份的問題,吳教授的方法也比較好理解,但是我還是覺得有點繁瑣,因此自己編寫了一個函數,此後遇到這種問題只需要運行一下函數就可以了。

這裏采用R中自帶的iris數據集,

> str(iris)
‘data.frame‘:	150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

  

iris數據集結構如上所示,其中Species是一個因子型數據,共有三個水平,根據Species將其可以分成三個子集,對每個子集進行五折交叉驗證的話,需要把每個數據集均分成五份,R語言代碼如下:

fiveDivide<-function(col,data,n=5)
{
  #col is a facotr type column,divide each group of the dataframe 
  #into n partitions,string type
  #data is a data.frame type in R
  #n represents the numbers which you want to divide into,default 5
  #the function return a list contain n data.frame
  #use sample(x) generate x numbers in unordered rank,then
  #divide the x numebr into n partitions
  group_num=length(levels(data[,col]))  #
  lst1=list() #按照因子分類把原數據分成group_num份
  lst2=list() #把每一個gruop分成等分的數據框
  lst3=list() #
  for(i in 1:group_num)
  {
    lst1[[i]]=data[data[col]==levels(data[,col])[i],]  #這裏先把原數據集按照因子水平分成n個子集
  }
  for(k in 1:group_num)  #這個循環的目的就是把麽個子集平均分成n份,並且是隨機分的,需要用到sample函數
  {
    od=sample(nrow(lst1[[k]]))
    newdata=lst1[[k]][od,]
    len=length(od)
    cutpoint=floor(len/n)
    for(j in 1:n)
    {
      if(len>=cutpoint*(1+j))
      {
        lst2[[j]]=newdata[(cutpoint*(j-1)+1):(cutpoint*j),]
      }
      else
      {
        lst2[[j]]=newdata[(cutpoint*(j-1)+1):len,]
      }
    }
    lst3[[k]]=lst2
  }
  return(lst3)
  #lst2=list()
}

  對iris進行處理:

> rep=fiveDivide("Species",iris,5)
> str(rep)
List of 3
 $ :List of 5
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 4.8 5.2 4.8 4.7 5.5 5.1 4.8 4.4 4.8 4.9
  .. ..$ Sepal.Width : num [1:10] 3 3.5 3.4 3.2 3.5 3.7 3.1 3 3.4 3
  .. ..$ Petal.Length: num [1:10] 1.4 1.5 1.6 1.6 1.3 1.5 1.6 1.3 1.9 1.4
  .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.2 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5 4.7 4.8 5.2 5.1 5.1 4.9 5.4 5 5.5
  .. ..$ Sepal.Width : num [1:10] 3.5 3.2 3 3.4 3.5 3.8 3.1 3.4 3.5 4.2
  .. ..$ Petal.Length: num [1:10] 1.3 1.3 1.4 1.4 1.4 1.5 1.5 1.7 1.6 1.4
  .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.1 0.2 0.2 0.3 0.1 0.2 0.6 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.4 4.3 4.9 5.4 4.4 4.6 5.1 5 5.1 5.1
  .. ..$ Sepal.Width : num [1:10] 3.9 3 3.6 3.9 3.2 3.6 3.4 3.4 3.8 3.8
  .. ..$ Petal.Length: num [1:10] 1.3 1.1 1.4 1.7 1.3 1 1.5 1.6 1.9 1.6
  .. ..$ Petal.Width : num [1:10] 0.4 0.1 0.1 0.4 0.2 0.2 0.2 0.4 0.4 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 4.4 4.5 5.3 5 5 5.1 5.4 5.2 5.1 5.4
  .. ..$ Sepal.Width : num [1:10] 2.9 2.3 3.7 3.3 3.4 3.3 3.7 4.1 3.5 3.4
  .. ..$ Petal.Length: num [1:10] 1.4 1.3 1.5 1.4 1.5 1.7 1.5 1.5 1.4 1.5
  .. ..$ Petal.Width : num [1:10] 0.2 0.3 0.2 0.2 0.2 0.5 0.2 0.1 0.3 0.4
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 4.6 5.8 5 5 5 4.6 5.7 4.9 5.7 4.6
  .. ..$ Sepal.Width : num [1:10] 3.4 4 3.6 3.2 3 3.2 4.4 3.1 3.8 3.1
  .. ..$ Petal.Length: num [1:10] 1.4 1.2 1.4 1.2 1.6 1.4 1.5 1.5 1.7 1.5
  .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.2 0.4 0.2 0.3 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
 $ :List of 5
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.2 6 5.8 6.3 5.5 5.8 5.8 6.1 6.2 5.6
  .. ..$ Sepal.Width : num [1:10] 2.9 3.4 2.7 3.3 2.6 2.6 2.7 3 2.2 3
  .. ..$ Petal.Length: num [1:10] 4.3 4.5 3.9 4.7 4.4 4 4.1 4.6 4.5 4.1
  .. ..$ Petal.Width : num [1:10] 1.3 1.6 1.2 1.6 1.2 1.2 1 1.4 1.5 1.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.4 5.6 5.7 6.6 6 6.4 5.9 6.9 6.7 5.5
  .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.8 3 2.2 2.9 3 3.1 3.1 2.5
  .. ..$ Petal.Length: num [1:10] 4.5 3.9 4.5 4.4 4 4.3 4.2 4.9 4.4 4
  .. ..$ Petal.Width : num [1:10] 1.5 1.1 1.3 1.4 1 1.3 1.5 1.5 1.4 1.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.5 5.2 6.8 6 5.7 5 6.3 5.7 5.5 5.6
  .. ..$ Sepal.Width : num [1:10] 2.8 2.7 2.8 2.9 2.9 2.3 2.5 2.8 2.3 3
  .. ..$ Petal.Length: num [1:10] 4.6 3.9 4.8 4.5 4.2 3.3 4.9 4.1 4 4.5
  .. ..$ Petal.Width : num [1:10] 1.5 1.4 1.4 1.5 1.3 1 1.5 1.3 1.3 1.5
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.6 6.7 5 6.7 5.9 6.1 5.7 5.4 6 5.1
  .. ..$ Sepal.Width : num [1:10] 2.9 3 2 3.1 3.2 2.8 2.6 3 2.7 2.5
  .. ..$ Petal.Length: num [1:10] 4.6 5 3.5 4.7 4.8 4 3.5 4.5 5.1 3
  .. ..$ Petal.Width : num [1:10] 1.3 1.7 1 1.5 1.8 1.3 1 1.5 1.6 1.1
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.6 6.1 6.3 7 4.9 5.7 5.5 5.5 6.1 5.6
  .. ..$ Sepal.Width : num [1:10] 2.7 2.9 2.3 3.2 2.4 3 2.4 2.4 2.8 2.9
  .. ..$ Petal.Length: num [1:10] 4.2 4.7 4.4 4.7 3.3 4.2 3.8 3.7 4.7 3.6
  .. ..$ Petal.Width : num [1:10] 1.3 1.4 1.3 1.4 1 1.2 1.1 1 1.2 1.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
 $ :List of 5
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.9 6.7 6.1 6.4 6.4 6.7 5.7 6.5 6.4 6.3
  .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.6 2.8 3.1 3.3 2.5 3 2.7 2.9
  .. ..$ Petal.Length: num [1:10] 5.7 5.8 5.6 5.6 5.5 5.7 5 5.5 5.3 5.6
  .. ..$ Petal.Width : num [1:10] 2.3 1.8 1.4 2.1 1.8 2.1 2 1.8 1.9 1.8
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.8 7.7 6.5 6.4 7.4 6.3 6.8 6 6.7 6.8
  .. ..$ Sepal.Width : num [1:10] 2.8 2.8 3.2 3.2 2.8 3.3 3 2.2 3.3 3.2
  .. ..$ Petal.Length: num [1:10] 5.1 6.7 5.1 5.3 6.1 6 5.5 5 5.7 5.9
  .. ..$ Petal.Width : num [1:10] 2.4 2 2 2.3 1.9 2.5 2.1 1.5 2.5 2.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.8 6.2 6 6.1 7.7 5.6 6.3 7.3 7.2 6.9
  .. ..$ Sepal.Width : num [1:10] 2.7 2.8 3 3 2.6 2.8 2.8 2.9 3 3.1
  .. ..$ Petal.Length: num [1:10] 5.1 4.8 4.8 4.9 6.9 4.9 5.1 6.3 5.8 5.4
  .. ..$ Petal.Width : num [1:10] 1.9 1.8 1.8 1.8 2.3 2 1.5 1.8 1.6 2.1
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.7 7.2 7.2 6.3 6.3 6.5 6.3 7.7 7.9 6.5
  .. ..$ Sepal.Width : num [1:10] 3 3.2 3.6 2.7 2.5 3 3.4 3.8 3.8 3
  .. ..$ Petal.Length: num [1:10] 5.2 6 6.1 4.9 5 5.8 5.6 6.7 6.4 5.2
  .. ..$ Petal.Width : num [1:10] 2.3 1.8 2.5 1.8 1.9 2.2 2.4 2.2 2 2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 7.7 6.4 6.2 6.9 6.7 7.1 5.8 4.9 5.9 7.6
  .. ..$ Sepal.Width : num [1:10] 3 2.8 3.4 3.1 3.1 3 2.7 2.5 3 3
  .. ..$ Petal.Length: num [1:10] 6.1 5.6 5.4 5.1 5.6 5.9 5.1 4.5 5.1 6.6
  .. ..$ Petal.Width : num [1:10] 2.3 2.2 2.3 2.3 2.4 2.1 1.9 1.7 1.8 2.1
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  


  

  均分以後數據表現為:

> rep
[[1]]
[[1]][[1]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
46          4.8         3.0          1.4         0.3  setosa
28          5.2         3.5          1.5         0.2  setosa
12          4.8         3.4          1.6         0.2  setosa
30          4.7         3.2          1.6         0.2  setosa
37          5.5         3.5          1.3         0.2  setosa
22          5.1         3.7          1.5         0.4  setosa
31          4.8         3.1          1.6         0.2  setosa
39          4.4         3.0          1.3         0.2  setosa
25          4.8         3.4          1.9         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa

[[1]][[2]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
41          5.0         3.5          1.3         0.3  setosa
3           4.7         3.2          1.3         0.2  setosa
13          4.8         3.0          1.4         0.1  setosa
29          5.2         3.4          1.4         0.2  setosa
1           5.1         3.5          1.4         0.2  setosa
20          5.1         3.8          1.5         0.3  setosa
10          4.9         3.1          1.5         0.1  setosa
21          5.4         3.4          1.7         0.2  setosa
44          5.0         3.5          1.6         0.6  setosa
34          5.5         4.2          1.4         0.2  setosa

[[1]][[3]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
17          5.4         3.9          1.3         0.4  setosa
14          4.3         3.0          1.1         0.1  setosa
38          4.9         3.6          1.4         0.1  setosa
6           5.4         3.9          1.7         0.4  setosa
43          4.4         3.2          1.3         0.2  setosa
23          4.6         3.6          1.0         0.2  setosa
40          5.1         3.4          1.5         0.2  setosa
27          5.0         3.4          1.6         0.4  setosa
45          5.1         3.8          1.9         0.4  setosa
47          5.1         3.8          1.6         0.2  setosa

[[1]][[4]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
9           4.4         2.9          1.4         0.2  setosa
42          4.5         2.3          1.3         0.3  setosa
49          5.3         3.7          1.5         0.2  setosa
50          5.0         3.3          1.4         0.2  setosa
8           5.0         3.4          1.5         0.2  setosa
24          5.1         3.3          1.7         0.5  setosa
11          5.4         3.7          1.5         0.2  setosa
33          5.2         4.1          1.5         0.1  setosa
18          5.1         3.5          1.4         0.3  setosa
32          5.4         3.4          1.5         0.4  setosa

[[1]][[5]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
7           4.6         3.4          1.4         0.3  setosa
15          5.8         4.0          1.2         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
36          5.0         3.2          1.2         0.2  setosa
26          5.0         3.0          1.6         0.2  setosa
48          4.6         3.2          1.4         0.2  setosa
16          5.7         4.4          1.5         0.4  setosa
35          4.9         3.1          1.5         0.2  setosa
19          5.7         3.8          1.7         0.3  setosa
4           4.6         3.1          1.5         0.2  setosa


[[2]]
[[2]][[1]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
98          6.2         2.9          4.3         1.3 versicolor
86          6.0         3.4          4.5         1.6 versicolor
83          5.8         2.7          3.9         1.2 versicolor
57          6.3         3.3          4.7         1.6 versicolor
91          5.5         2.6          4.4         1.2 versicolor
93          5.8         2.6          4.0         1.2 versicolor
68          5.8         2.7          4.1         1.0 versicolor
92          6.1         3.0          4.6         1.4 versicolor
69          6.2         2.2          4.5         1.5 versicolor
89          5.6         3.0          4.1         1.3 versicolor

[[2]][[2]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
52          6.4         3.2          4.5         1.5 versicolor
70          5.6         2.5          3.9         1.1 versicolor
56          5.7         2.8          4.5         1.3 versicolor
76          6.6         3.0          4.4         1.4 versicolor
63          6.0         2.2          4.0         1.0 versicolor
75          6.4         2.9          4.3         1.3 versicolor
62          5.9         3.0          4.2         1.5 versicolor
53          6.9         3.1          4.9         1.5 versicolor
66          6.7         3.1          4.4         1.4 versicolor
90          5.5         2.5          4.0         1.3 versicolor

[[2]][[3]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
55           6.5         2.8          4.6         1.5 versicolor
60           5.2         2.7          3.9         1.4 versicolor
77           6.8         2.8          4.8         1.4 versicolor
79           6.0         2.9          4.5         1.5 versicolor
97           5.7         2.9          4.2         1.3 versicolor
94           5.0         2.3          3.3         1.0 versicolor
73           6.3         2.5          4.9         1.5 versicolor
100          5.7         2.8          4.1         1.3 versicolor
54           5.5         2.3          4.0         1.3 versicolor
67           5.6         3.0          4.5         1.5 versicolor

[[2]][[4]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
59          6.6         2.9          4.6         1.3 versicolor
78          6.7         3.0          5.0         1.7 versicolor
61          5.0         2.0          3.5         1.0 versicolor
87          6.7         3.1          4.7         1.5 versicolor
71          5.9         3.2          4.8         1.8 versicolor
72          6.1         2.8          4.0         1.3 versicolor
80          5.7         2.6          3.5         1.0 versicolor
85          5.4         3.0          4.5         1.5 versicolor
84          6.0         2.7          5.1         1.6 versicolor
99          5.1         2.5          3.0         1.1 versicolor

[[2]][[5]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
95          5.6         2.7          4.2         1.3 versicolor
64          6.1         2.9          4.7         1.4 versicolor
88          6.3         2.3          4.4         1.3 versicolor
51          7.0         3.2          4.7         1.4 versicolor
58          4.9         2.4          3.3         1.0 versicolor
96          5.7         3.0          4.2         1.2 versicolor
81          5.5         2.4          3.8         1.1 versicolor
82          5.5         2.4          3.7         1.0 versicolor
74          6.1         2.8          4.7         1.2 versicolor
65          5.6         2.9          3.6         1.3 versicolor


[[3]]
[[3]][[1]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
121          6.9         3.2          5.7         2.3 virginica
109          6.7         2.5          5.8         1.8 virginica
135          6.1         2.6          5.6         1.4 virginica
129          6.4         2.8          5.6         2.1 virginica
138          6.4         3.1          5.5         1.8 virginica
125          6.7         3.3          5.7         2.1 virginica
114          5.7         2.5          5.0         2.0 virginica
117          6.5         3.0          5.5         1.8 virginica
112          6.4         2.7          5.3         1.9 virginica
104          6.3         2.9          5.6         1.8 virginica

[[3]][[2]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
115          5.8         2.8          5.1         2.4 virginica
123          7.7         2.8          6.7         2.0 virginica
111          6.5         3.2          5.1         2.0 virginica
116          6.4         3.2          5.3         2.3 virginica
131          7.4         2.8          6.1         1.9 virginica
101          6.3         3.3          6.0         2.5 virginica
113          6.8         3.0          5.5         2.1 virginica
120          6.0         2.2          5.0         1.5 virginica
145          6.7         3.3          5.7         2.5 virginica
144          6.8         3.2          5.9         2.3 virginica

[[3]][[3]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
143          5.8         2.7          5.1         1.9 virginica
127          6.2         2.8          4.8         1.8 virginica
139          6.0         3.0          4.8         1.8 virginica
128          6.1         3.0          4.9         1.8 virginica
119          7.7         2.6          6.9         2.3 virginica
122          5.6         2.8          4.9         2.0 virginica
134          6.3         2.8          5.1         1.5 virginica
108          7.3         2.9          6.3         1.8 virginica
130          7.2         3.0          5.8         1.6 virginica
140          6.9         3.1          5.4         2.1 virginica

[[3]][[4]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
146          6.7         3.0          5.2         2.3 virginica
126          7.2         3.2          6.0         1.8 virginica
110          7.2         3.6          6.1         2.5 virginica
124          6.3         2.7          4.9         1.8 virginica
147          6.3         2.5          5.0         1.9 virginica
105          6.5         3.0          5.8         2.2 virginica
137          6.3         3.4          5.6         2.4 virginica
118          7.7         3.8          6.7         2.2 virginica
132          7.9         3.8          6.4         2.0 virginica
148          6.5         3.0          5.2         2.0 virginica

[[3]][[5]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
136          7.7         3.0          6.1         2.3 virginica
133          6.4         2.8          5.6         2.2 virginica
149          6.2         3.4          5.4         2.3 virginica
142          6.9         3.1          5.1         2.3 virginica
141          6.7         3.1          5.6         2.4 virginica
103          7.1         3.0          5.9         2.1 virginica
102          5.8         2.7          5.1         1.9 virginica
107          4.9         2.5          4.5         1.7 virginica
150          5.9         3.0          5.1         1.8 virginica
106          7.6         3.0          6.6         2.1 virginica

  

R語言——K折交叉驗證之隨機均分數據集