Blast+ 使用補充筆記
Blast比對軟體大概是是短序列區域性比對軟體中最常用的一個了,但是其引數眾多,一些引數一直沒好好仔細研究過,如下:
增加blast比對結果資訊
blast的-outfmt
引數,使blastp -help
即可檢視每個輸出格式的資訊,如下所示:
*** Formatting options -outfmt <String> alignment view options: 0 = Pairwise, 1 = Query-anchored showing identities, 2 = Query-anchored no identities, 3 = Flat query-anchored showing identities, 4 = Flat query-anchored no identities, 5 = BLAST XML, 6 = Tabular, 7 = Tabular with comment lines, 8 = Seqalign (Text ASN.1), 9 = Seqalign (Binary ASN.1), 10 = Comma-separated values, 11 = BLAST archive (ASN.1), 12 = Seqalign (JSON), 13 = Multiple-file BLAST JSON, 14 = Multiple-file BLAST XML2, 15 = Single-file BLAST JSON, 16 = Single-file BLAST XML2, 18 = Organism Report Options 6, 7 and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion qaccver means Query accesion.version qlen means Query sequence length sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession saccver means Subject accession.version sallacc means All subject accessions slen means Subject sequence length qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame btop means Blast traceback operations (BTOP) staxid means Subject Taxonomy ID ssciname means Subject Scientific Name scomname means Subject Common Name sblastname means Subject Blast Name sskingdom means Subject Super Kingdom staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) sscinames means unique Subject Scientific Name(s), separated by a ';' scomnames means unique Subject Common Name(s), separated by a ';' sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP qcovus means Query Coverage Per Unique Subject (blastn only)
其實我們一般常用的就是-outfmt 5
或者-outfmt 6
,前者輸出XML格式,後者輸出TAB分割格式;前者在早期一篇博文Blast+ xml格式解讀中提起過(資訊比較全,用處也相對比較廣),而後者則是平時最為常用的格式(也是一些軟體喜歡呼叫的格式)
TAB格式每列資訊如下(可以對照上面的說明理解一下):
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
但我們有時想要的並不止上述12列資訊,比如我還想知道比對結果的覆蓋度資訊(qcovs:Query Coverage Per Subject)
其實只要在blast比對命令中先事先加上需要增加的列ID即可,如在outfmt 6
基礎上加上覆蓋度資訊:
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs"
注:需要幾列就一直往上加即可,空格分割
分割NR子庫
之前分割NR子庫選用的是早期一篇博文ofollow,noindex" target="_blank">建立NR子庫以及從NR庫提取特定物種分類的序列
但是現在NCBI出了blast-2.8版本,其可支援用NCBI自帶程式碼分割的NR子庫的索引作為比對的庫,使用比較方便
- Support for a new version of the BLAST database that allows you to limit search by taxonomy as well some other improvements.
當然如果用這個版本的話,NR庫也要重新下載了ftp://ftp.ncbi.nlm.nih.gov/blast/db/v5/
使用方式也比較簡單(至少比之前的方法方便了),如果只想比對單一物種(如人:9606的話),命令如下:
blastp –db nr –query query.fasta –taxids 9606 –outfmt 6 –out blast.outfm6
如果想比對NR子庫哺乳動物的話,需要先建個哺乳動物子庫索引
get_species_taxids.sh -t 40674 > 40674.txids
然後再將序列比對至NR哺乳動物子庫
blastp –db nr –query query.fasta –taxidlist 40674.txids –outfmt 6 –out blast.outfm6
具體說明可看:https://ftp.ncbi.nlm.nih.gov/blast/db/v5/blastdbv5.pdf
本文出自於http://www.bioinfo-scrounger.com 轉載請註明出處