Annovar註釋說明【轉載自http://blog.csdn.net/u013816205/article/details/51262289】

阿新 • • 發佈：2017-06-22

probably scores hit bar package 文件轉換命名 gre 下載

ANNOVAR是一個perl編寫的命令行工具，能在安裝了perl解釋器的多種操作系統上執行。允許多種輸入文件格式，包括最常被使用的VCF格式。輸出文件也有多種格式，包括註釋過的VCF文件、用tab或者逗號分隔的text文件。 ANNOVAR能快速註釋遺傳變異並預測其功能。類似的variants註釋軟件還有 VEP, snpEff, VAAST, AnnTools等等.

ANNOVAR支持三種不同形式的註釋： gene-based, region-based 和filter-based. 這三種註釋分別針對於每一個variant的不同方面：基於基因的註釋（gene-based annotation）揭示variant與已知基因直接的關系以及對其產生的功能性影響；基於區域的註釋（region-based annotation）揭示variant 與不同基因組特定段的關系，例如：它是否落在已知的保守基因組區域；基於過濾子的註釋（ filter-based annotation ）則給出這個variant的一系列信息，如： population frequency in different populations 和various types of variant-deleteriousness prediction scores, 這些可被用來過濾掉一些公共的及 probably（大概,肯定的成分較大,是most likely） nondeleterious variants.

(A) 用ANNOVAR註釋人類基因組variants信息
(i)填寫登記表，下載ANNOVAR軟件（http://annovar.openbio informatics.org/）， ‘annovar.latest.tar.gz’ file,解壓文件

[html] view plain copy

tar xvfz annovar.latest.tar.gz

關鍵：也可將目錄路徑添加到操作系統的環境變量中去，這樣就可以通過輸出命令名直接運行 ANNOVAR腳本。
(ii)下載所有需要的註釋信息庫，對於基因註釋的已經在下好的 ANNOVAR package中了。如果要進行其他註釋，需要按以下命令下載數據庫到 ‘humandb/’ 目錄裏：

[html] view plain copy

perl annotate_variation.pl --downdb --buildver hg19 cytoBand humandb/
perl annotate_variation.pl --downdb --webfrom annovar --buildver hg19 1000g2014oct humandb/
perl annotate_variation.pl --downdb --webfrom annovar --buildver hg19 exac03 humandb/
perl annotate_variation.pl --downdb --webfrom annovar --buildver hg19 ljb26_all humandb/

perl annotate_variation.pl --downdb --webfrom annovar --buildver hg19 clinvar_20140929 humandb/
perl annotate_variation.pl --downdb --webfrom annovar --buildver hg19 snp138 humandb/

這裏下載的是幾個通常用到的數據庫：

1、‘cytoBand’ 是每個細胞間band（cytogenetic band）的染色體坐標信息 ,

2、 ‘1000g2014oct’ for alternative allele frequency in the 1000 Genomes Project (version October 2014),

是2014年10版，1000基因組項目（和ExAV 外顯子集合聯合一樣，是公開、開放的數據庫）裏面供選擇的等位基因頻率信息

3、‘exac03’for the variants reported in the Exome Aggregation Consortium (version 0.3),

是0.3版外顯子集合聯合中報道過的variants.

4、 ‘ljb26_all’ for various functional deleteriousness prediction scores from the dbNSFP database (version 2.6),

dbNSFP: A Lightweight Database of Human NonsynonymousSNPs and TheirFunctionalPredictions on ResearchGate

5、 ‘clinvar_20140929’ for the variants reported in the ClinVar database (version 20140929)

ClinVar是美國國家生物技術信息中心（NCBI）於2012年11月宣布、2013年4月正式啟動的公共、免費數據庫。作為核心數據庫，ClinVar數據庫整合了十多個不同類型數據庫、通過標準的命名法來描述疾病，同時支持科研人員將數據下載到本地中，開展更為個性化的研究。在遺傳變異和臨床表型方面，NCBI和不同的研究組已經建立了各種各樣的數據庫，數據信息相對比較分散，ClinVar數據庫的目的在於整合這些分散的數據、將變異、臨床表型、實證數據以及功能註解與分析等四個方面的信息，通過專家評審，逐步形成一個標準的、可信的、穩定的遺傳變異-臨床表型相關的數據庫。

6、‘snp138’ for the dbSNP database (version 138).
註意：1、第一個命令中不包含 ‘--webfrom annovar’ 選項, 因此是從the UCSC Genome Browser annotation database下載文件的；

2、 ‘--buildver hg19’ 選項是針對hg19這一版的基因組的；

3、運行上面命令後，在 ‘humandb/’ 目錄下會多幾個以 ‘hg19’為前綴的文件。

(iii) 用the ‘table_annovar.pl’ 來註釋variants。允許在同一命令中用輸出的特定順序來對多個註釋類型進行自定義選擇（custom selection）。

輸入下列命令，用之前下載好的註釋數據庫來註釋vcf格式文件中的variants

[html] view plain copy

perl table_annovar.pl <variant.vcf> humandb/ --outfile final --buildver hg19 --protocol refGene,cytoBand,1000g2014oct_eur,1000g2014oct_afr,exac03,ljb26_all,clinvar_20140929,snp138 --operation g,r,f,f,f,f,f,f --vcfinput

<variant.vcf> 參考（refers to ）輸入的vcf文件的名稱

‘--protocol’ 選項後跟註釋來源數據庫的準確名稱

‘--operation’ 選項後跟註釋的類型: ‘g’ 表示基於基因的註釋（gene-based annotation）、‘r’ 表示基於區域的註釋（region-based annotation）、‘f’ 表示基於篩選子的註釋（ filter-based annotation）.

‘--outfile’ 選項是指定輸出文件的前綴
關鍵步驟（ CR ITICAL STEP）： 1、確保註釋數據庫的名稱正確並且是按你想要在輸出文件中顯示的順序排列的；

2、確保 ‘--operation’指定的註釋類型順序和‘--protocol’指定的數據庫順序是一致的；

3、確保每個protocal名稱或註釋類型之間只有一個逗號，並且沒有空白。

(iv) ‘final.hg19_multianno.vcf’.輸出文件應該是以個VCF格式文件，INFO那列以 ‘key=value’ 形式、 ‘;’分割成幾個小區域. eg:‘Func.refGene=intronic;Gene.refGene=SAMD11’. 每個鍵值對代表一個ANNOVAR註釋信息。輸出文件可以用為VCF格式文件設計的基因分析軟件進一步處理。

(v) ‘final.hg19_multianno.txt’. 每一行代表一個variant 。用tab分隔，多余列為加上的註釋信息，順序按 ‘--protocol’ 選項所設定的註釋類型argument。
(B) 用 ANNOVAR 對非人類的物種進行基於基因的註釋（Gene-based annotation）
 CR ITICAL STEP關鍵：以註釋大猩猩基因組（with the genome build identifier as panTro2.）為例。ANNOVAR的安裝同A(i).

對於gene-based annotation， ANNOVAR需要genePred format的gene definition file和 FASTA format 的transcript sequence file；
(i). 輸入以下命令，下載大猩猩基因組定義文件（ gene definition file）及序列的 FASTA 文件到‘chimpdb/’目錄

[html] view plain copy

perl annotate_variation.pl --downdb --buildver panTro2 gene chimpdb/
perl annotate_variation.pl --downdb --buildver panTro2 seq chimpdb/panTro2_seq

(ii) 註意ANNOVAR數據庫中只包含人類基因組已建好的轉錄本，不包含其他物種的。故需要按以下命令自行建立對應物種的transcript FASTA file

[html] view plain copy

perl retrieve_seq_from_fasta.pl chimpdb/panTro2_refGene.txt --seqdir chimpdb/panTro2_seq --format refGene --outfile chimpdb/panTro2_refGeneMrna.fa

1、 ‘--seqdir’說明下載的序列文件的所在目錄；

2、‘--format’ 說明 gene definition file的格式.；

3、 ‘--outfile’ 指定輸出mRNA 序列文件的名稱；
關鍵：跟在‘--outfile’後的輸出文件名應該是 ‘<buildver>_refGeneMrna.fa’這種形式，否則下一步找不到正確的 transcript FASTA sequence file.

(iii) 註釋variants，with the chimpanzee gene annotation:

[html] view plain copy

perl table_annovar.pl <variant.vcf> chimpdb/ --vcfinput --outfile final --buildver panTro2 --protocol refGene --operation g

Here <variant.vcf> is the input VCF file, ‘chimpdb/’ is the directory of the downloaded data

(iv) 輸出結果文件核對。 ‘final.panTro2_multianno.txt’ file. The gene annotation for chimpanzee is added after the input variants.
關鍵：如果沒有現成可用的gene definition file ，可以將基因預測工具產生的 GFF3 or GTF 文件轉換成 gene definition file.

以構建擬南芥（Arabidopsis thaliana）的註釋所需文件為例
#1. 在http://plants.ensembl.org/info/website/ftp/index.html 下載Arabidopsis 的 GTF file 和 genome FASTA file，到 ‘atdb’目錄下.

[html] view plain copy

mkdir atdb cd atdb wget ftp://ftp.ensemblgenomes.org/pub/release-27/plants/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.27.dna.genome.fa.gz

[html] view plain copy

wget ftp://ftp.ensemblgenomes.org/pub/release-27/plants/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.27.gtf.gz

#2. 解壓文件

[html] view plain copy

gunzip Arabidopsis_thaliana.TAIR10.27.dna.genome.fa.gz gunzip Arabidopsis_thaliana.TAIR10.27.gtf.gz

#3、下載gff3ToGenePred’ 或gtfToGenePred 工具（http://hgdown load.soe.ucsc.edu/admin/exe/Linux.x86_64/），推薦使用GTF格式，因為有些GFF3格式文件轉換可能不正確

#4. 用 gtfToGenePred 工具將 GTF file 轉換 GenePred file:

[html] view plain copy

gtfToGenePred -genePredExt Arabidopsis_thaliana.TAIR10.27.gtf AT_refGene.txt

#5. 用retrieve_seq_from_fasta.pl生成 transcript FASTA file

[html] view plain copy

perl ../retrieve_seq_from_fasta.pl --format refGene --seqfile Arabidopsis_thaliana.TAIR10.27.dna.genome.fa AT_refGene.txt AT_refGeneMrna.fa

#After this step, the annotation database files needed for gene-based annotation are ready. Now you can annotate a given VCF file using the procedure starting from B(iii). Please note that the ‘--buildver’ argument should be set to ‘AT’.

參考http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ for more details.bases and other arguments are the same as in the human genome annotation.

Annovar註釋說明【轉載自http://blog.csdn.net/u013816205/article/details/51262289】

probably scores hit bar package 文件轉換命名 gre 下載 ANNOVAR是一個perl編寫的命令行工具，能在安裝了perl解釋器的多種操作系統上執行。允許多種輸入文件格式，包括最常被使用的VCF格式。輸出文件也有多種格式，包括註釋過的V

Go語言庫之strconv包（轉載自--http://blog.csdn.net/alvine008/article/details/51283189）

整型 print 特殊 imp size 無符號 this http 相差 golang strconv.ParseInt 是將字符串轉換為數字的函數 func ParseInt(s string, base int, bitSize int) (i int64, err

極大極小搜索思想+（α/β）減枝【轉自-----https://blog.csdn.net/hzk_cpp/article/details/79275772】

ima 基本個數博弈論數字這就是 pre -- 繼續極大極小搜索，即minimax搜索算法，專門用來做博弈論的問題的暴力. 多被稱為對抗搜索算法. 這個搜索算法的基本思想就是分兩層，一層是先手，記為a，還有一層是後手，記為b. 這個搜索是認為這a與b的利益關

pycharm的斷點除錯【轉自https://blog.csdn.net/weixin_39198406/article/details/78873120】

1. show execution point (F10)顯示目前專案所有斷點2. step over (F8)下一步但僅限於設定斷點的檔案3. step into (F7)執行下一行4. step into my code (Alt+Shift+F7)執行下一行但忽略libraries（匯入庫的語句）5.

eclipse 集成Maven（轉自:http://blog.csdn.net/wode_dream/article/details/38052639）

lin loser 說明位置到你 ide lan core fontsize 當自己越來越多的接觸到開源項目時，發現大多數的開源項目都是用maven來夠建的。並且在開發應用時，也越來越意識到maven的確會解決很多問題，如果你要了解maven，可以參考：Maven入門

laravel session使用轉自http://blog.csdn.net/angle_hearts/article/details/53923782

com lar new get ssi name sym 存儲 angle use Symfony\Component\HttpFoundation\Session\Session;//存儲session$session = new Session;$session->

SSM框架——詳細整合教程（Spring+SpringMVC+MyBatis）轉載（http://blog.csdn.net/zhshulin/article/details/23912615）

rop 用戶名 file .org 我們 XML model lib targe 這兩天需要用到MyBatis的代碼自動生成的功能，由於MyBatis屬於一種半自動的ORM框架，所以主要的工作就是配置Mapping映射文件，但是由於手寫映射文件很容易出錯，所以可利用MyBa

jquery中使用event.target的幾點說明 (轉自http://blog.csdn.net/zm2714/article/details/8119642)

board ngs 使用 net spa dtd meta xhtml function jquery中使用event.target的幾點說明 event.target 說明：引發事件的DOM元素。 this和event.target的區別 js中事件是

linux音頻alsa-uda134x驅動文檔閱讀之一轉自http://blog.csdn.net/wantianpei/article/details/7817293

發出 hand 增加 int chang == 音頻 set device 前言目前，linux系統常用的音頻驅動有兩種形式:alsa oss alsa:現在是linux下音頻驅動的主要形式，與簡單的oss兼容。oss：過去的形式而我們板子上的uda1341用的就是als

SSIS獲得Excel行號(轉自http://blog.csdn.net/zplume/article/details/19113911)

number source 行數 put article 情況 art r+ 數據庫問題描述：首先個人並不推薦將Excel作為數據源，因為Excel單元格式會引起特別多的數據轉換問題，例如：單元格裏明明是2.89,但SSIS抽取到數據庫裏面之後卻變成了2.8899999

tensorbosrd出現No graph definition files were found，補充內容以下內容轉載自https://blog.csdn.net/u014165082/article/details/79556366 tensorflow入門：新版本語法改動以及tensorbo

tensorbosrd出現No graph definition files were found，補充內容在writer=tf.summary.FileWriter('./my_graph',sess.graph) 一句中， ./my_graph的絕對路徑不允許出現漢語，否則就會出現No

Annovar註釋說明【轉載自http://blog.csdn.net/u013816205/article/details/51262289】

Annovar註釋說明【轉載自http://blog.csdn.net/u013816205/article/details/51262289】

Go語言庫之strconv包（轉載自--http://blog.csdn.net/alvine008/article/details/51283189）

極大極小搜索思想+（α/β）減枝【轉自-----https://blog.csdn.net/hzk_cpp/article/details/79275772】

pycharm的斷點除錯【轉自https://blog.csdn.net/weixin_39198406/article/details/78873120】

eclipse 集成Maven（轉自:http://blog.csdn.net/wode_dream/article/details/38052639）

laravel session使用轉自http://blog.csdn.net/angle_hearts/article/details/53923782

SSM框架——詳細整合教程（Spring+SpringMVC+MyBatis）轉載（http://blog.csdn.net/zhshulin/article/details/23912615）

jquery中使用event.target的幾點說明 (轉自http://blog.csdn.net/zm2714/article/details/8119642)

linux音頻alsa-uda134x驅動文檔閱讀之一轉自http://blog.csdn.net/wantianpei/article/details/7817293

SSIS獲得Excel行號(轉自http://blog.csdn.net/zplume/article/details/19113911)

tensorbosrd出現No graph definition files were found，補充內容以下內容轉載自https://blog.csdn.net/u014165082/article/details/79556366 tensorflow入門：新版本語法改動以及tensorbo

關於LIST擴容的三種方式（轉載自https://blog.csdn.net/wt122694/article/details/81173128）

python tkinter中點選回車清空Text，同時游標顯示在0.0（轉載自 https://blog.csdn.net/dcyywin8/article/details/83306011）

fastjson的使用（轉自:http://blog.csdn.net/wx_962464/article/details/37612861）

Python Tkinter 會話窗口（轉載自https://blog.csdn.net/bnanoou/article/details/38515083）

iotcl函式(轉自http://blog.csdn.net/shanshanpt/article/details/19897897)

Hibernate物件三種狀態詳細分析（轉自http://blog.csdn.net/redarmy_chen/article/details/7069482）

sed和awk的一些使用【轉載自http://blog.sina.com.cn/s/blog_6561ca8c0102we0o.html】

轉載自：http://blog.csdn.net/hguisu/article/details/7418161 作者為：真實的歸宿

用maven插件自動生成mybatis代碼（轉載http://blog.csdn.net/yinkgh/article/details/52512983）

Annovar註釋說明【轉載自http://blog.csdn.net/u013816205/article/details/51262289】

相關推薦