org.bdgp.util
Class DNAUtils

java.lang.Object
  |
  +--org.bdgp.util.DNAUtils

public class DNAUtils
extends java.lang.Object

a collection of constants and static methods useful in manipulating ascii representations of DNA sequences.


Field Summary
protected static java.lang.String[][][] aa1
          Genetic Code in 1-character amino acid codes
protected static java.lang.String[][][] aa3
          Genetic Code in 3-character amino acid codes
static int COMPLEMENT
           
static int FORWARD_SPLICED_TRANSLATION
           
static int[] FRAME_MAPPING
           
static int FRAME_NEG_ONE
           
static int FRAME_NEG_THREE
           
static int FRAME_NEG_TWO
           
static int FRAME_ONE
           
static int FRAME_THREE
           
static int FRAME_TWO
           
protected static char[] id_to_letter
          ascii character codes for each nucleotide (or set of nucleotides).
protected static int[] letter_to_id
          ordinal numbers of nucleotides associated with each possible ascii character code.
static int LETTERS
          number of "letters" that are valid in a string of nucleotide codes.
static int NUCLEOTIDES
           
static int ONE_LETTER_CODE
           
static int REVERSE_SPLICED_TRANSLATION
           
static int THREE_LETTER_CODE
           
 
Constructor Summary
DNAUtils()
           
 
Method Summary
static java.lang.String chunkReverse(java.lang.String s, int offset, int chunk_size)
          determines the reverse of a part of a sequence of nucleotides.
static java.lang.String complement(java.lang.String s)
          determines the complement of a sequence of nucleotides.
protected static void complementBuffer(java.lang.StringBuffer buf)
          determines the complement of a sequence of nucleotides.
static double GCcontent(java.lang.String dna)
           
static double GCcontent(java.lang.StringBuffer dna)
           
static char[] getAllowedDNACharacters()
          return an array of all allowed characters used to represent nucleotides This _should_ follow IUPAC spec, but doesn't yet
static java.lang.String[][][] getGeneticCodeOne()
          gets a representation of the genetic code.
static java.lang.String[][][] getGeneticCodeThree()
          gets a representation of the genetic code.
static int[] getNACharToIdMap()
          gets a map from letters to numbers each representing nucleotides.
static char[] getNAIdToCharMap()
          gets a map from numbers to letters each representing nucleotides.
static char getResidueChar(int residue_id)
          gets a nucleotide code.
static int getResidueID(char residue_letter)
          gets an index into an array of codes for nucleotides.
static java.lang.String reverse(java.lang.String s)
          determines the reverse of a sequence of nucleotides.
static java.lang.String reverseComplement(java.lang.String s)
          determines the reverse complement of a sequence of nucleotides.
static java.lang.String translate(java.lang.String s, int frametype, int codetype)
          gets a translation into amino acids of a string of nucleotides.
static java.lang.String translate(java.lang.String s, int frametype, int codetype, java.lang.String initial_string, java.lang.String pre_string, java.lang.String post_string)
          gets a translation into amino acids of a string of nucleotides.
static java.lang.String translate(java.lang.String s, int frametype, java.lang.String[][][] genetic_code, java.lang.String initial_string, java.lang.String pre_string, java.lang.String post_string)
          gets a translation into amino acids of a string of nucleotides.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NUCLEOTIDES

public static final int NUCLEOTIDES

COMPLEMENT

public static final int COMPLEMENT

FRAME_ONE

public static final int FRAME_ONE

FRAME_TWO

public static final int FRAME_TWO

FRAME_THREE

public static final int FRAME_THREE

FRAME_NEG_ONE

public static final int FRAME_NEG_ONE

FRAME_NEG_TWO

public static final int FRAME_NEG_TWO

FRAME_NEG_THREE

public static final int FRAME_NEG_THREE

FORWARD_SPLICED_TRANSLATION

public static final int FORWARD_SPLICED_TRANSLATION

REVERSE_SPLICED_TRANSLATION

public static final int REVERSE_SPLICED_TRANSLATION

FRAME_MAPPING

public static final int[] FRAME_MAPPING

ONE_LETTER_CODE

public static final int ONE_LETTER_CODE

THREE_LETTER_CODE

public static final int THREE_LETTER_CODE

aa1

protected static java.lang.String[][][] aa1
Genetic Code in 1-character amino acid codes

aa3

protected static java.lang.String[][][] aa3
Genetic Code in 3-character amino acid codes

LETTERS

public static final int LETTERS
number of "letters" that are valid in a string of nucleotide codes.

letter_to_id

protected static int[] letter_to_id
ordinal numbers of nucleotides associated with each possible ascii character code. Unused characters are associated with the integer -1.

id_to_letter

protected static char[] id_to_letter
ascii character codes for each nucleotide (or set of nucleotides).
Constructor Detail

DNAUtils

public DNAUtils()
Method Detail

complement

public static java.lang.String complement(java.lang.String s)
determines the complement of a sequence of nucleotides.
Parameters:
s - a string of nucleotide codes.
Returns:
the complementary codes.

reverseComplement

public static java.lang.String reverseComplement(java.lang.String s)
determines the reverse complement of a sequence of nucleotides.
Parameters:
s - a string of nucleotide codes.
Returns:
the complementary codes in reverse order.

reverse

public static java.lang.String reverse(java.lang.String s)
determines the reverse of a sequence of nucleotides.
Parameters:
s - a string of nucleotide codes.
Returns:
the codes in reverse order.

chunkReverse

public static java.lang.String chunkReverse(java.lang.String s,
                                            int offset,
                                            int chunk_size)
determines the reverse of a part of a sequence of nucleotides.
Parameters:
s - a string of nucleotide codes.
offset - the number of characters to skip at the beginning of s.
chunk_size - the number of characters in the portion to be reversed
Returns:
the codes of the specified chunk, in reverse order.

complementBuffer

protected static void complementBuffer(java.lang.StringBuffer buf)
determines the complement of a sequence of nucleotides.
Parameters:
buf - a string of nucleotide codes each of which is replaced with it's complementary code.
See Also:
complement(java.lang.String)

getGeneticCodeThree

public static java.lang.String[][][] getGeneticCodeThree()
gets a representation of the genetic code. The three dimensions of the array returned correspond to the three nucleotides in a codon. Each dimension ranges from 0 to 4 representing bases A, C, G, T, and N respectively. Prefer the constants A, C, G, T, and N to the integers when subscripting the array.
Returns:
the genetic code expressed in three-character amino acid codes.

getGeneticCodeOne

public static java.lang.String[][][] getGeneticCodeOne()
gets a representation of the genetic code. The three dimensions of the array returned correspond to the three nucleotides in a codon. Each dimension ranges from 0 to 4 representing bases A, C, G, T, and N respectively. Prefer the constants A, C, G, T, and N to the integers when subscripting the array.
Returns:
the genetic code expressed in one-character amino acid codes.

translate

public static java.lang.String translate(java.lang.String s,
                                         int frametype,
                                         int codetype)
gets a translation into amino acids of a string of nucleotides.
Parameters:
s - represents the string of nucleotides.
frametype - FRAME_ONE, FRAME_TWO, or FRAME_THREE. For reverse strand frames, translate the reverse complement. Then reverse that result.
codetype - ONE_LETTER_CODE, or THREE_LETTER_CODE indicating how many letters should encode each amino acid.
Returns:
a representation of the amino acid sequence encoded by the given nucleotide sequence.

translate

public static java.lang.String translate(java.lang.String s,
                                         int frametype,
                                         int codetype,
                                         java.lang.String initial_string,
                                         java.lang.String pre_string,
                                         java.lang.String post_string)
gets a translation into amino acids of a string of nucleotides.
Parameters:
s - represents the string of nucleotides.
frametype - FRAME_ONE, FRAME_TWO, or FRAME_THREE. For reverse strand frames, translate the reverse complement. Then reverse that result.
codetype - ONE_LETTER_CODE, or THREE_LETTER_CODE indicating how many letters should encode each amino acid.
initial_string - what goes at front of entire translation
pre_string - what goes before every amino acid
post_string - what goes after every amino acid
Returns:
a representation of the amino acid sequence encoded by the given nucleotide sequence.

translate

public static java.lang.String translate(java.lang.String s,
                                         int frametype,
                                         java.lang.String[][][] genetic_code,
                                         java.lang.String initial_string,
                                         java.lang.String pre_string,
                                         java.lang.String post_string)
gets a translation into amino acids of a string of nucleotides.
Parameters:
s - represents the string of nucleotides.
frametype - FRAME_ONE, FRAME_TWO, or FRAME_THREE. For reverse strand frames, translate the reverse complement. Then reverse that result.
genetic_code - the result of one of the getGeneticCode methods of this class.
initial_string - what goes at front of entire translation
pre_string - what goes before every amino acid
post_string - what goes after every amino acid
Returns:
a representation of the amino acid sequence encoded by the given nucleotide sequence.
See Also:
getGeneticCodeOne(), getGeneticCodeThree()

getResidueID

public static int getResidueID(char residue_letter)
gets an index into an array of codes for nucleotides.
Parameters:
residue_letter - letter representation of a nucleotide.
Returns:
ordinal of nucleotide letter code.
See Also:
getResidueChar(int)

getResidueChar

public static char getResidueChar(int residue_id)
gets a nucleotide code.
Parameters:
residue_id - ordinal of nucleotide letter code.
Returns:
letter representation of a nucleotide.
See Also:
getResidueChar(int)

getNACharToIdMap

public static int[] getNACharToIdMap()
gets a map from letters to numbers each representing nucleotides.

getNAIdToCharMap

public static char[] getNAIdToCharMap()
gets a map from numbers to letters each representing nucleotides.

getAllowedDNACharacters

public static char[] getAllowedDNACharacters()
return an array of all allowed characters used to represent nucleotides This _should_ follow IUPAC spec, but doesn't yet

GCcontent

public static double GCcontent(java.lang.String dna)

GCcontent

public static double GCcontent(java.lang.StringBuffer dna)