de.jstacs.data.sequences
Class SparseSequence

java.lang.Object
  extended by de.jstacs.data.sequences.Sequence<int[]>
      extended by de.jstacs.data.sequences.SimpleDiscreteSequence
          extended by de.jstacs.data.sequences.SparseSequence
All Implemented Interfaces:
Comparable<Sequence<int[]>>

public final class SparseSequence
extends SimpleDiscreteSequence

This class is an implementation for sequences on one alphabet with length 4. This implementation can be used, for instance, for DNA sequences.

The symbols are encoded in the bits of the primitive type long, which allows to save 32 symbols in one long. On the one hand an instance of this class is more memory efficient than any other SimpleDiscreteSequence, e.g. ByteSequence. But on the other hand this class will be a little bit slower when accessing single positions.

Author:
Jens Keilwagen

Nested Class Summary
 
Nested classes/interfaces inherited from class de.jstacs.data.sequences.Sequence
Sequence.CompositeSequence<T>, Sequence.RecursiveSequence<T>, Sequence.SubSequence<T>
 
Field Summary
 
Fields inherited from class de.jstacs.data.sequences.Sequence
alphabetCon, annotation, rc
 
Constructor Summary
SparseSequence(AlphabetContainer alphCon, String seq)
          Creates a new SparseSequence from a String representation.
SparseSequence(AlphabetContainer alphCon, SymbolExtractor se)
          Creates a new SparseSequence from a SymbolExtractor.
 
Method Summary
 SparseSequence complement(int start, int end)
          This method returns a new instance of Sequence containing a part of the complementary current Sequence.
 int discreteVal(int pos)
          Returns the discrete value at position pos of the Sequence.
protected  SparseSequence flatCloneWithoutAnnotation()
          Works in analogy to Object.clone(), but does not clone the annotation.
static DataSet getDataSet(AlphabetContainer con, AbstractStringExtractor... se)
          This method allows to create a DataSet containing SparseSequences.
static DataSet getDataSet(AlphabetContainer con, String filename)
          This method allows to create a DataSet containing SparseSequences using a file name.
static DataSet getDataSet(AlphabetContainer con, String filename, SequenceAnnotationParser parser)
          This method allows to create a DataSet containing SparseSequences using a file name.
 int getLength()
          Returns the length of the Sequence.
 SparseSequence reverse(int start, int end)
          This method returns a new instance of Sequence containing a part of the reverse current Sequence.
 SparseSequence reverseComplement(int start, int end)
          This method returns a new instance of Sequence containing a reverse part of the complementary current Sequence.
 
Methods inherited from class de.jstacs.data.sequences.SimpleDiscreteSequence
addToRepresentation, compareTo, continuousVal, fillContainer, getEmptyContainer, getEmptyRepresentation, getStringRepresentation, hashCodeForPos, isMultiDimensional
 
Methods inherited from class de.jstacs.data.sequences.Sequence
annotate, compareTo, complement, create, create, create, equals, getAlphabetContainer, getAnnotation, getCompositeSequence, getCompositeSequence, getHammingDistance, getNumberOfSequenceAnnotationsByType, getSequenceAnnotationByType, getSequenceAnnotationByTypeAndIdentifier, getSubSequence, getSubSequence, getSubSequence, getSubSequence, hashCode, matches, reverse, reverseComplement, toDiscrete, toString, toString, toString, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SparseSequence

public SparseSequence(AlphabetContainer alphCon,
                      String seq)
               throws WrongSequenceTypeException,
                      WrongAlphabetException
Creates a new SparseSequence from a String representation.

Parameters:
alphCon - the AlphabetContainer
seq - the sequence as String
Throws:
WrongSequenceTypeException - if the AlphabetContainer is not simple or the internal Alphabet has more than 4 symbols
WrongAlphabetException - if the AlphabetContainer is not discrete
See Also:
SparseSequence(AlphabetContainer, SymbolExtractor)

SparseSequence

public SparseSequence(AlphabetContainer alphCon,
                      SymbolExtractor se)
               throws WrongSequenceTypeException,
                      WrongAlphabetException
Creates a new SparseSequence from a SymbolExtractor.

Parameters:
alphCon - the AlphabetContainer
se - the SymbolExtractor
Throws:
WrongSequenceTypeException - if the AlphabetContainer is not simple or the internal Alphabet has more than 4 symbols
WrongAlphabetException - if the AlphabetContainer is not discrete
See Also:
SparseSequence(AlphabetContainer, int, SequenceAnnotation[])
Method Detail

discreteVal

public int discreteVal(int pos)
Description copied from class: Sequence
Returns the discrete value at position pos of the Sequence.

Specified by:
discreteVal in class Sequence<int[]>
Parameters:
pos - the position of the Sequence
Returns:
the discrete value at position pos of the Sequence

getLength

public int getLength()
Description copied from class: Sequence
Returns the length of the Sequence.

Specified by:
getLength in class Sequence<int[]>
Returns:
the length of the Sequence

complement

public SparseSequence complement(int start,
                                 int end)
                          throws OperationNotSupportedException
Description copied from class: Sequence
This method returns a new instance of Sequence containing a part of the complementary current Sequence.
So invoking this method, for instance, on the sequence "TAATA" with an AlphabetContainer on DNAAlphabet returns "ATTAT".

Overrides:
complement in class Sequence<int[]>
Parameters:
start - the start position (inclusive) in the original Sequence
end - the end position (exclusive) in the original Sequence
Returns:
the complementary Sequence of the part
Throws:
OperationNotSupportedException - if the current Sequence is not based on a ComplementableDiscreteAlphabet
See Also:
ComplementableDiscreteAlphabet

reverse

public SparseSequence reverse(int start,
                              int end)
                       throws OperationNotSupportedException
Description copied from class: Sequence
This method returns a new instance of Sequence containing a part of the reverse current Sequence.

Overrides:
reverse in class Sequence<int[]>
Parameters:
start - the start position (inclusive) in the original Sequence
end - the end position (exclusive) in the original Sequence
Returns:
the reverse Sequence of the part
Throws:
OperationNotSupportedException - if the current Sequence is based on an AlphabetContainer that is not simple

reverseComplement

public SparseSequence reverseComplement(int start,
                                        int end)
                                 throws OperationNotSupportedException
Description copied from class: Sequence
This method returns a new instance of Sequence containing a reverse part of the complementary current Sequence. For more details see the methods Sequence.reverse() and Sequence.complement().

Overrides:
reverseComplement in class Sequence<int[]>
Parameters:
start - the start position (inclusive) in the original Sequence
end - the end position (exclusive) in the original Sequence
Returns:
the reverse complementary Sequence of the part
Throws:
OperationNotSupportedException - if the current Sequence is not discrete and simple ((not based on a ComplementableDiscreteAlphabet)
See Also:
Sequence.reverse(), Sequence.complement(), ComplementableDiscreteAlphabet

flatCloneWithoutAnnotation

protected SparseSequence flatCloneWithoutAnnotation()
Description copied from class: Sequence
Works in analogy to Object.clone(), but does not clone the annotation. This method is used in Sequence.annotate(boolean, SequenceAnnotation...).

Specified by:
flatCloneWithoutAnnotation in class Sequence<int[]>
Returns:
the cloned Sequence without annotation

getDataSet

public static DataSet getDataSet(AlphabetContainer con,
                                 String filename,
                                 SequenceAnnotationParser parser)
                          throws FileNotFoundException,
                                 WrongAlphabetException,
                                 WrongSequenceTypeException,
                                 EmptyDataSetException,
                                 IOException
This method allows to create a DataSet containing SparseSequences using a file name. Annotations are parsed by the supplied SequenceAnnotationParser. The file is assumed to be in FastA format.

Parameters:
con - the AlphabetContainer for the DataSet and SparseSequences
filename - the file name
parser - a parser for the annotations of the SparseSequences
Returns:
a DataSet containing SparseSequences
Throws:
FileNotFoundException - if the file filename could not be found
WrongAlphabetException - if the alphabet does not fit the data
WrongSequenceTypeException - if the data can not be represented as floats
EmptyDataSetException - if not sequences exist in filename
IOException - if the file could not be read

getDataSet

public static DataSet getDataSet(AlphabetContainer con,
                                 String filename)
                          throws FileNotFoundException,
                                 WrongAlphabetException,
                                 WrongSequenceTypeException,
                                 EmptyDataSetException,
                                 IOException
This method allows to create a DataSet containing SparseSequences using a file name.

Parameters:
con - the AlphabetContainer for the DataSet and SparseSequences
filename - the file name
Returns:
a DataSet containing SparseSequences
Throws:
FileNotFoundException - if the file filename could not be found
WrongAlphabetException - if the alphabet does not fit the data
WrongSequenceTypeException - if the data can not be represented as floats
EmptyDataSetException - if not sequences exist in filename
IOException - if the file could not be read

getDataSet

public static final DataSet getDataSet(AlphabetContainer con,
                                       AbstractStringExtractor... se)
                                throws WrongSequenceTypeException,
                                       WrongAlphabetException,
                                       EmptyDataSetException
This method allows to create a DataSet containing SparseSequences.

Parameters:
con - the AlphabetContainer for the DataSet and Sequences
se - the AbstractStringExtractors that handle the DataSet as String
Returns:
a DataSet containing SparseSequences
Throws:
WrongSequenceTypeException - if the AlphabetContainer is not simple or the internal Alphabet has more than 4 symbols
WrongAlphabetException - if the AlphabetContainer is not discrete
EmptyDataSetException - if a DataSet with 0 (zero) Sequence should be created