R語言學習筆記之五
摘要: 僅用於記錄R語言學習過程:
內容提要:
數據排序:sort()函數、rank()函數、order()函數;
長寬型數據的轉換:stack()函數、reshape()函數、reshape2擴展包中的函數:melt()函數、dcast()函數
變量的因子化:factor()函數、cut()函數、ifelse()函數、car擴展包中的recode()函數
正文:
數據排序、長寬型數據的轉換
n 數據排序
u sort()函數:可對數字和字符串進行排序
x <- sample(1:100,10)
x
[1] 64 38 93 74 22 87 59 30 8 24
sort(x)
[1] 8 22 24 30 38 59 64 74 87 93
sort(x,decreasing = TRUE)
[1] 93 87 74 64 59 38 30 24 22 8
示例二:
y <- c(‘ab‘,‘bc‘,‘cde‘,‘c‘)
sort(y)
[1] "ab" "bc" "c" "cde"
sort(y,decreasing = TRUE)
[1] "cde" "c" "bc" "ab"
u rank()函數:秩次(排名):給出數字的位次,如果有兩個相同的,取位置的平均數。
示例:
z <- c(1,2,3,3,4,4,5,6,6,6,7,8)
rank(z)
[1] 1.0 2.0 3.5 3.5 5.5 5.5 7.0 9.0 9.0 9.0 11.0
[12] 12.0
u order()函數:最常用。返回的是向量的下標,按照向量從小到大的順序返回
> x
[1] 78 75 41 72 85 32 77 47 80 51
> order(x)
[1] 6 3 8 10 4 2 7 1 9 5
> x[order(x)]
[1] 32 41 47 51 72 75 77 78 80 85
也可以對數據框進行排序:
head(iris[order(iris$我,iris$是),])
n 長寬型數據的轉換
n 長型:堆棧,數據間有不同的分類(如同屬一類);
n 寬型:數據內容相對唯一
n stack()函數:(堆棧的意思)
> freshman <- c(12,23,24)
> sophomores <- c(25,36,73)
> juniors <- c(32,46,57)
> data.frame(fr= freshman,so = sophomores,jun = juniors)
fr so jun
1 12 25 32
2 23 36 46
3 24 73 57
> height <- stack(list(fresh =freshman,sopho = sophomores,juni = juniors))
> height
values ind
1 12 fresh
2 23 fresh
3 24 fresh
4 25 sopho
5 36 sopho
6 73 sopho
7 32 juni
8 46 juni
9 57 juni
> tapply(height$values,height$ind,mean) #按照分類求均值,tapply()函數
fresh sopho juni
19.66667 44.66667 45.00000
n reshape()函數:
u 寬型數據:參數設置:reshape(變量名,數值名稱,idvar:標識變量,timevar用於接收‘次數’,direction 設置為寬型數據格式)
wide <- reshape(Indometh,v.names = ‘conc‘,idvar = ‘Subject‘,
timevar =‘time‘,direction ="wide")
head(wide)
u 長型數據:reshape(文件名,idvar,varying指擬用於區分出來的內容。)
> long <- reshape(wide,idvar = "subject",varying = list(2:12),
+ v.names = "concentration",direction ="long")
> View(long)
n reshape2擴展包中的函數
u melt()函數:參數設置:data=文件名,id.vars 標識變量
new_iris <- melt(data = iris,id.vars = ‘Species‘)
u dcast()函數:參數設置:(文件名,公式=標識變量~操作變量,匯總函數=mean,value.var = 需要進行匯總的變量)#dcast()非常強大的函數
dcast(new_iris,formula = Species-variable,fun.aggregate = mean,value.var = ‘value‘)
u tips數據集示例
dcast(tips,formula = sex~.,fun.aggregate = mean,value.var = ‘tip‘) #給小費與性別的關系 (.點表示占位符,因為只有一個待比較的變量)
sex .
1 Female 2.833448
2 Male 3.089618
dcast(data = tips,formula = sex~smoker,fun.aggregate = mean,value.var = ‘tip‘) #給小費與性別和抽煙與否的關系
sex No Yes
1 Female 2.773519 2.931515
2 Male 3.113402 3.051167
變量的因子化 (即把連續的變量轉換為分類變量)
n 公式法
u 示例1:
> age <- sample(20:80,20)
> age
[1] 49 64 63 75 74 79 45 66 28 76 60 33 39 77 35 44 31 38 24 53
> age1 <- 1+ (age >30) +(age >40) +(age > 50)
> age1
[1] 3 4 4 4 4 4 3 4 1 4 4 2 2 4 2 3 2 2 1 4
> age_fac <- factor(age1,labels = c(‘young‘,‘middle‘,‘m-old‘,‘old‘))
> age_fac
[1] m-old old old old old old m-old old young old old middle middle
[14] old middle m-old middle middle young old
Levels: young middle m-old old
u 示例2:與示例1達到相同的結果
> age2 <- 1*(age < 30) + 2*(age >=30 & age < 40) + 3*(age >=40 & age <50)+4*(age>=50)
> age2
[1] 3 4 4 4 4 4 3 4 1 4 4 2 2 4 2 3 2 2 1 4
n cut()法:很常用
u 示例1:
> age3 <- cut(age,breaks = 4,labels = c(‘young‘,‘middle‘,‘m-old‘,‘old‘),include.lowest = TRUE,
+ right = TRUE)
> age3
[1] middle m-old m-old old old old middle old young old m-old young middle old
[15] young middle young middle young m-old
Levels: young middle m-old old
u 示例2:
> age4 <- cut(age,breaks = seq(20,80,length.out = 4),labels = c(‘young‘,
+ ‘middle‘,‘old‘))
> age4
[1] middle old old old old old middle old young old middle young young old
[15] young middle young young young middle
Levels: young middle old
n ifelse()函數:參數設置test是指待用於檢驗的元素,第二個參數代表檢驗值為真(yes),第三個參數代表檢驗值為假(false)。很好用,很常用
u 示例1:
> ifelse(age > 50,‘old‘,‘young‘)
[1] "young" "old" "old" "old" "old" "old" "young"
[8] "old" "young" "old" "old" "young" "young" "old"
[15] "young" "young" "young" "young" "young" "old"
u 示例2:
> ifelse(age >60,‘old‘,ifelse(age <30,‘young‘,ifelse ((age >= 30 & age < 45),‘m-young‘,‘m-old‘)))
[1] "m-old" "old" "old" "old" "old"
[6] "old" "m-old" "old" "young" "old"
[11] "m-old" "m-young" "m-young" "old" "m-young"
[16] "m-young" "m-young" "m-young" "young" "m-old"
n car擴展包中的recode()函數:參數設置,待變量,recode為重新編碼規則
u 示例
> recode(var = age, recode =‘20:29 = 1;30:39 = 2;40:49 = 3;50:hi = 4‘)
[1] 3 4 4 4 4 4 3 4 1 4 4 2 2 4 2 3 2 2 1 4
R語言學習筆記之五