1. 程式人生 > >DNA sequence open reading frames (ORFs) | DNA序列的開放閱讀框ORF預測

DNA sequence open reading frames (ORFs) | DNA序列的開放閱讀框ORF預測

ear xtend sta plus htm allow dev program HR

常見的ORF預測工具

Open Reading Frame Finder - NCBI

ORF Finder - SMS

OrfPredictor - YSU

基本概念

開放閱讀框(英語:Open reading frame;縮寫:ORF;其他譯名:開放閱讀框架、開放讀架等)是指在給定的閱讀框架中,不包含終止密碼子的一串序列。這段序列是生物個體的基因組中,可能作為蛋白質編碼序列的部分。基因中的ORF包含並位於開始編碼與終止編碼之間。由於一段DNA或RNA序列有多種不同讀取方式,因此可能同時存在許多不同的開放閱讀框架。有一些計算機程序可分析出最可能是蛋白質編碼的序列。

關鍵詞:

1. 不包含終止密碼子的一串序列;

2. 可能作為蛋白質編碼序列的部分;

3. 有多種不同讀取方式,因此可能同時存在許多不同的開放閱讀框架;

4. 有些工具會用blast比對來提高可信度

示例

一段5‘-UCUAAAGGUCCA-3‘序列。此序列共有3種讀取法:

  1. UCU AAA GGU CCA
  2. CUA AAG GUC
  3. UAA AGG UCC

由於UAA為終止編碼,因此第三種讀取法不具編譯出蛋白質的潛力,故只有前兩者為開放閱讀框架

個人當然是推薦使用NCBI大佬開發的工具的啦,發文章可信度高些。

以下是Linux版該工具的說明:

USAGE
  ORFfinder [-h] [-help] [-xmlhelp] [-in Input_File] [-id Accession_GI]
    [-b begin] [-e end] [-c circular] [-g Genetic_code] [-s Start_codon]
    [-ml minimal_length] [-n nested_ORFs] [-strand Strand] [-out Output_File]
    [-outfmt output_format] [-logfile File_Name] [-conffile File_Name]
    [-version] [-version-full] [-dryrun]

DESCRIPTION
   Searching open reading frames in a sequence

OPTIONAL ARGUMENTS
 -h
   Print USAGE and DESCRIPTION;  ignore all other parameters
 -help
   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters
 -xmlhelp
   Print USAGE, DESCRIPTION and ARGUMENTS in XML format; ignore all other
   parameters
 -logfile <File_Out>
   File to which the program log should be redirected
 -conffile <File_In>
   Program‘s configuration (registry) data file
 -version
   Print version number;  ignore other arguments
 -version-full
   Print extended version data;  ignore other arguments
 -dryrun
   Dry run the application: do nothing, only test all preconditions

 *** Input query options (one of them has to be provided):
 -in <File_In>
   name of file with the nucleotide sequence in FASTA format
   (more than one sequence is allowed)
   Default = `‘
 -id <String>
   Accession or gi number of the nucleotide sequence
   (ignored, if the file name is provided)
   Default = `‘

 *** Query sequence details:
 -b <Integer>
   Start address of sequence fragment to be processed
   Default = `1‘
 -e <Integer>
   Stop address of sequence fragment to be processed (0 - to the end of the
   sequence)
   Default = `0‘
 -c <Boolean>
   Is the sequence circular? (t/f) *** Under development
   Default = `false‘

 *** Search parameters:
 -g <Integer>
   Genetic code to use (1-31)
   see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details
   Default = `1‘
 -s <Integer>
   ORF start codon to use:
       0 = "ATG" only
       1 = "ATG" and alternative initiation codons
       2 = any sense codon
   Default = `1‘
 -ml <Integer>
   Minimal length of the ORF (nt)
   Value less than 30 is automatically changed by 30.
   Default = `75‘
 -n <Boolean>
   Ignore nested ORFs (completely placed within another)
   Default = `false‘
 -strand <String>
   Output ORFs on specified strand only (both|plus|minus)
   Default = `both‘

 *** Output options:
 -out <File_Out>
   Output file name
 -outfmt <Integer>
   Output options:
       0 = list of ORFs in FASTA format
       1 = CDS in FASTA format
       2 = Text ASN.1
       3 = Feature table
   Default = `0‘

  

ORFfinder -in in.fasta -s 2 -ml 100 -out test.out -outfmt 3

  

DNA sequence open reading frames (ORFs) | DNA序列的開放閱讀框ORF預測