1. 程式人生 > >單細胞分析實錄(8): 展示marker基因的4種圖形(一)

單細胞分析實錄(8): 展示marker基因的4種圖形(一)

今天的內容講講單細胞文章中經常出現的展示細胞marker的圖:tsne/umap圖、熱圖、堆疊小提琴圖、氣泡圖,每個圖我都會用兩種方法繪製。 > 使用的資料來自文獻:Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. 去年7月發表在Cell Research上的關於鼻咽癌的文章,資料下載:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE150430 ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222848618-152201216.png) 髓系細胞數量相對少一些,為了方便演示,選它作為例子。 ``` library(Seurat) library(tidyverse) library(harmony) Myeloid=read.table("Myeloid_mat.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F) Myeloid_anno=read.table("Myeloid_anno.txt",header = T,sep = "\t",stringsAsFactors = F) ``` > 匯入資料的時候需要注意一個地方:從cell ranger得到的矩陣,每一列的列名會在CB後面加上"-1"這個字串,在R裡面匯入資料時,會自動轉化為".1",在做匹配的時候需要注意一下。我已經提前轉換為"_1" ``` > summary(as.numeric(Myeloid["CD14",])) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 0.000 0.689 1.080 2.111 4.500 > summary(as.numeric(Myeloid["PTPRC",])) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 1.200 1.681 1.607 2.124 3.520 ``` 考慮到下載的表達矩陣,表達值都不是整數,且大於0,推測該矩陣已經經過了標準化,因此下面的流程會跳過這一步 ### 0. Seurat流程 我們直接把註釋結果賦值到[email protected]矩陣中,後面省去聚類這一步 ``` mye.seu=CreateSeuratObject(Myeloid) [email protected]$CB=rownames([email protected]) [email protected]=merge([email protected],Myeloid_anno,by="CB") rownames([email protected])[email protected]$CB #替代LogNormalize這一步 mye.seu[["RNA"]]@data=mye.seu[["RNA"]]@counts mye.seu <- FindVariableFeatures(mye.seu, selection.method = "vst", nfeatures = 2000) mye.seu <- ScaleData(mye.seu, features = rownames(mye.seu)) mye.seu <- RunPCA(mye.seu, npcs = 50, verbose = FALSE) mye.seu=mye.seu %>% RunHarmony("sample", plot_convergence = TRUE) mye.seu <- RunUMAP(mye.seu, reduction = "harmony", dims = 1:20) mye.seu <- RunTSNE(mye.seu, reduction = "harmony", dims = 1:20) #少了聚類 DimPlot(mye.seu, reduction = "tsne", group.by = "celltype", pt.size=1)+theme( axis.line = element_blank(), axis.ticks = element_blank(),axis.text = element_blank() ) ggsave("tsne1.pdf",device = "pdf",width = 17,height = 14,units = "cm") ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222849738-1322101526.png) 基本符合原圖,三個亞群分得開,三個亞群分不開。 *** 接下來,我用4種方式展示marker基因,這些基因可以在文獻的補充材料裡面找到。 ### 1. tsne展示marker基因 ``` FeaturePlot(mye.seu,features = "CCR7",reduction = "tsne",pt.size = 1)+ scale_x_continuous("")+scale_y_continuous("")+ theme_bw()+ #改變ggplot2的主題 theme( #進一步修改主題 panel.grid.major = element_blank(),panel.grid.minor = element_blank(), #去掉背景線 axis.ticks = element_blank(),axis.text = element_blank(), #去掉座標軸刻度和數字 legend.position = "none", #去掉圖例 plot.title = element_text(hjust = 0.5,size=14) #改變標題位置和字型大小 ) ggsave("CCR7.pdf",device = "pdf",width = 10,height = 10.5,units = "cm") ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222850349-101913774.png) 另一種方法就是把tsne的座標和基因的表達值提取出來,用ggplot2畫,其實不是很必要,因為FeaturePlot也是基於ggplot2的,我還是演示一下 ``` mat1=as.data.frame(mye.seu[["RNA"]]@data["CCR7",]) colnames(mat1)="exp" mat2=Embeddings(mye.seu,"tsne") mat3=merge(mat2,mat1,by="row.names") #資料格式如下: > head(mat3) Row.names tSNE_1 tSNE_2 exp 1 N01_AAACGGGCATTTCAGG_1 5.098727 32.748145 0.000 2 N01_AAAGATGCAATGTAAG_1 -24.394040 26.176422 0.000 3 N01_AACTCAGGTAATAGCA_1 11.856730 8.086553 0.000 4 N01_AACTCAGGTCTTCGTC_1 10.421878 12.660407 0.000 5 N01_AACTTTCAGGCCATAG_1 33.555756 -10.437406 1.606 6 N01_AAGACCTTCGAATGGG_1 -23.976967 11.897753 0.738 mat3%>%ggplot(aes(tSNE_1,tSNE_2))+geom_point(aes(color=exp))+ scale_color_gradient(low = "grey",high = "purple")+theme_bw() ggsave("CCR7.2.pdf",device = "pdf",width = 13.5,height = 12,units = "cm") ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222850898-20484357.png) 用ggplot2的好處就是圖形修改很方便,畢竟ggplot2大家都很熟悉 ### 2. 熱圖展示marker基因 畫圖前,需要給每個細胞一個身份,因為我們跳過了聚類這一步,此處需要手動賦值 ``` Idents(mye.seu)="celltype" library(xlsx) markerdf1=read.xlsx("ref_marker.xlsx",sheetIndex = 1) markerdf1$gene=as.character(markerdf1$gene) # 這個表格整理自原文的附表,選了53個基因 #資料格式 # > head(markerdf1) # gene celltype # 1 S100B DC2(CD1C+) # 2 HLA-DQB2 DC2(CD1C+) # 3 FCER1A DC2(CD1C+) # 4 CD1A DC2(CD1C+) # 5 PKIB DC2(CD1C+) # 6 NDRG2 DC2(CD1C+) DoHeatmap(mye.seu,features = markerdf1$gene,label = F,slot = "scale.data") ggsave("heatmap.pdf",device = "pdf",width = 23,height = 16,units = "cm") ``` label = F不在熱圖的上方標註細胞型別, slot = "scale.data"使用scale之後的矩陣畫圖,預設就是這個 ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222851364-2051865684.png) 接下來用pheatmap畫,在佈局上可以自由發揮 ``` library(pheatmap) [email protected][,c("CB","celltype")] colanno=colanno%>%arrange(celltype) rownames(colanno)=colanno$CB colanno$CB=NULL colanno$celltype=factor(colanno$celltype,levels = unique(colanno$celltype)) ``` 先對細胞進行排序,按照celltype的順序,然後對基因排序 ``` rowanno=markerdf1 rowanno=rowanno%>%arrange(celltype) ``` 提取scale矩陣的行列時,按照上面的順序 ``` mat4=mye.seu[["RNA"]]@scale.data[rowanno$gene,rownames(colanno)] mat4[mat4>=2.5]=2.5 mat4[mat4 < (-1.5)]= -1.5 #小於負數時,加括號! ``` 下面就是繪圖程式碼了,我加了分界線,使其看上去更有區分度 ``` pheatmap(mat4,cluster_rows = F,cluster_cols = F, show_colnames = F, annotation_col = colanno, gaps_row=as.numeric(cumsum(table(rowanno$celltype))[-6]), gaps_col=as.numeric(cumsum(table(colanno$celltype))[-6]), filename="heatmap.2.pdf",width=11,height = 7 ) ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222851925-2023792964.png) *** 先寫到這兒吧(原本以為能寫完的),剩下的氣泡圖、堆疊小提琴圖改天再補上。 > 因水平有限,有錯誤的地方,歡迎批評