de.jstacs.data.sequences
Class Sequence<T>

java.lang.Object
  extended by de.jstacs.data.sequences.Sequence<T>
Type Parameters:
T - the type of each position
All Implemented Interfaces:
Comparable<Sequence<T>>
Direct Known Subclasses:
ArbitraryFloatSequence, ArbitrarySequence, MultiDimensionalSequence, Sequence.RecursiveSequence, SimpleDiscreteSequence

public abstract class Sequence<T>
extends Object
implements Comparable<Sequence<T>>

This is the main class for all sequences. All sequences are immutable.

Author:
Jens Keilwagen

Nested Class Summary
protected static class Sequence.CompositeSequence<T>
          The class handles composite Sequences.
static class Sequence.RecursiveSequence<T>
          This is the main class for subsequences, composite sequences, ...
protected static class Sequence.SubSequence<T>
          This class handles subsequences.
 
Field Summary
protected  AlphabetContainer alphabetCon
          The underlying alphabets.
protected  SequenceAnnotation[] annotation
          The annotation of the Sequence.
protected  Sequence<T> rc
          The pointer to the reverse complement of the Sequence.
 
Constructor Summary
protected Sequence(AlphabetContainer container, SequenceAnnotation[] annotation)
          Creates a new Sequence with the given AlphabetContainer and the given annotation, but without the content.
 
Method Summary
protected abstract  void addToRepresentation(Object representation, int pos, String delim)
          This method adds the information of one position to the representation using the specified delimiter
 Sequence annotate(boolean add, SequenceAnnotation... annotation)
          This method allows to append annotation to a Sequence.
 int compareTo(Sequence<T> s)
           
protected abstract  int compareTo(T t1, T t2)
          This method compares to container and is used in compareTo(Sequence).
 Sequence complement()
          This method returns a new instance of Sequence containing the complementary current Sequence.
 Sequence complement(int start, int end)
          This method returns a new instance of Sequence containing a part of the complementary current Sequence.
abstract  double continuousVal(int pos)
          Returns the continuous value at position pos of the Sequence.
static Sequence create(AlphabetContainer con, SequenceAnnotation[] annotation, String sequence, String delim)
          Creates a Sequence from a String based on the given AlphabetContainer using the given delimiter delim and some annotation for the Sequence.
static Sequence create(AlphabetContainer con, String sequence)
          Creates a Sequence from a String based on the given AlphabetContainer using the standard delimiter for this AlphabetContainer.
static Sequence create(AlphabetContainer con, String sequence, String delim)
          Creates a Sequence from a String based on the given AlphabetContainer using the given delimiter delim.
abstract  int discreteVal(int pos)
          Returns the discrete value at position pos of the Sequence.
 boolean equals(Object o)
           
abstract  void fillContainer(T container, int pos)
          The method fills the content of a specific position in to the container.
protected abstract  Sequence flatCloneWithoutAnnotation()
          Works in analogy to Object.clone(), but does not clone the annotation.
 AlphabetContainer getAlphabetContainer()
          Return the alphabets, i.e.
 SequenceAnnotation[] getAnnotation()
          Returns the annotation of the Sequence.
 Sequence<T> getCompositeSequence(AlphabetContainer abc, int[] starts, int[] lengths)
          This method should be used if one wants to create a DataSet of Sequence.CompositeSequences.
 Sequence getCompositeSequence(int[] starts, int[] lengths)
          This is a very efficient way to create a Sequence.CompositeSequence for sequences with a simple AlphabetContainer.
abstract  T getEmptyContainer()
          The method returns a container that can be used for accessing the symbols for each position.
protected abstract  Object getEmptyRepresentation()
          Returns an empty representation which is used to create the String representation of this instance in the method toString(String, int, int).
 int getHammingDistance(Sequence seq)
          This method returns the Hamming distance between the current Sequence and seq.
abstract  int getLength()
          Returns the length of the Sequence.
 int getNumberOfSequenceAnnotationsByType(String type)
          Returns the number of SequenceAnnotations of type type for this Sequence.
 SequenceAnnotation getSequenceAnnotationByType(String type, int idx)
          Returns the SequenceAnnotation no.
 SequenceAnnotation getSequenceAnnotationByTypeAndIdentifier(String type, String identifier)
          Returns the SequenceAnnotation of this Sequence that has type type and identifier identifier.
protected abstract  String getStringRepresentation(Object representation)
          This method creates a String representation from the given representation.
 Sequence getSubSequence(AlphabetContainer abc, int start)
          This method should be used if one wants to create a DataSet of subsequences of defined length.
 Sequence getSubSequence(AlphabetContainer abc, int start, int length)
          This method should be used if one wants to create a DataSet of subsequences of defined length.
 Sequence getSubSequence(int start)
          This is a very efficient way to create a subsequence/suffix for Sequences with a simple AlphabetContainer.
 Sequence getSubSequence(int start, int length)
          This is a very efficient way to create a subsequence of defined length for Sequences with a simple AlphabetContainer.
 int hashCode()
           
protected abstract  int hashCodeForPos(int pos)
          This method is used in hashCode() and the hash code for one specific position.
abstract  boolean isMultiDimensional()
          The method returns true if the sequence is multidimensional, otherwise .
 boolean matches(int maxHammingDistance, Sequence shortSequence)
          This method allows to answer the question whether there is a similar pattern find a match with a given maximal number of mismatches.
 Sequence reverse()
          This method returns a new instance of Sequence containing the reverse current Sequence.
 Sequence reverse(int start, int end)
          This method returns a new instance of Sequence containing a part of the reverse current Sequence.
 Sequence reverseComplement()
          This method returns a new instance of Sequence containing the reverse complementary current Sequence.
 Sequence reverseComplement(int start, int end)
          This method returns a new instance of Sequence containing a reverse part of the complementary current Sequence.
protected  int toDiscrete(int pos, double content)
          This method converts a continuous value at position pos of the Sequence into a discrete one.
 String toString()
          Returns a String representation of the Sequence (normally the Sequence in its original Alphabet).
 String toString(int start)
          Returns a String representation of the Sequence (normally the Sequence in its original Alphabet) beginning at position start with a default delimiter as separator.
 String toString(int start, int end)
          Returns a String representation of the Sequence (normally the Sequence in its original Alphabet) between start and end with a default delimiter as separator.
 String toString(String delim, int start, int end)
          Returns a String representation of the Sequence (normally the Sequence in its original alphabet) between start and end with delim as separator.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

alphabetCon

protected AlphabetContainer alphabetCon
The underlying alphabets.


rc

protected Sequence<T> rc
The pointer to the reverse complement of the Sequence.


annotation

protected SequenceAnnotation[] annotation
The annotation of the Sequence.

Constructor Detail

Sequence

protected Sequence(AlphabetContainer container,
                   SequenceAnnotation[] annotation)
Creates a new Sequence with the given AlphabetContainer and the given annotation, but without the content. The content has to be set by the constructor of the extending class.

Parameters:
container - the AlphabetContainer of the Sequence
annotation - the annotation of the Sequence
Method Detail

continuousVal

public abstract double continuousVal(int pos)
Returns the continuous value at position pos of the Sequence.

Parameters:
pos - the position of the Sequence
Returns:
the continuous value at position pos of the Sequence

discreteVal

public abstract int discreteVal(int pos)
Returns the discrete value at position pos of the Sequence.

Parameters:
pos - the position of the Sequence
Returns:
the discrete value at position pos of the Sequence

equals

public boolean equals(Object o)
Overrides:
equals in class Object

getAlphabetContainer

public final AlphabetContainer getAlphabetContainer()
Return the alphabets, i.e. the AlphabetContainer, used in this Sequence.

Returns:
the alphabets, i.e. the AlphabetContainer, used in this Sequence

getAnnotation

public final SequenceAnnotation[] getAnnotation()
Returns the annotation of the Sequence.

Returns:
the annotation of the Sequence (can be null)

getSequenceAnnotationByTypeAndIdentifier

public SequenceAnnotation getSequenceAnnotationByTypeAndIdentifier(String type,
                                                                   String identifier)
Returns the SequenceAnnotation of this Sequence that has type type and identifier identifier.

Parameters:
type - the chosen type of the SequenceAnnotation
identifier - the chosen identifier of the SequenceAnnotation
Returns:
the first SequenceAnnotation that meets the criteria

getSequenceAnnotationByType

public SequenceAnnotation getSequenceAnnotationByType(String type,
                                                      int idx)
Returns the SequenceAnnotation no. idx of this Sequence that has type type

Parameters:
type - the chosen type of a subset of SequenceAnnotations
idx - the index of the returned SequenceAnnotation within this subset.
Returns:
the SequenceAnnotation no. idx with type type

getNumberOfSequenceAnnotationsByType

public int getNumberOfSequenceAnnotationsByType(String type)
Returns the number of SequenceAnnotations of type type for this Sequence.

Parameters:
type - the type
Returns:
the number of annotations

getCompositeSequence

public Sequence<T> getCompositeSequence(AlphabetContainer abc,
                                        int[] starts,
                                        int[] lengths)
This method should be used if one wants to create a DataSet of Sequence.CompositeSequences. With this constructor you are enabled to create a DataSet where every Sequence has the same AlphabetContainer instance.

Internally it is checked that the AlphabetContainer matches with the one of the Sequence.CompositeSequence.

Parameters:
abc - the new AlphabetContainer
starts - the start positions of the junks
lengths - the length of each junk
Returns:
the Sequence.CompositeSequence
See Also:
Sequence.CompositeSequence.Sequence.CompositeSequence(de.jstacs.data.AlphabetContainer, de.jstacs.data.sequences.Sequence, int[], int[])

getCompositeSequence

public Sequence getCompositeSequence(int[] starts,
                                     int[] lengths)
This is a very efficient way to create a Sequence.CompositeSequence for sequences with a simple AlphabetContainer.

Parameters:
starts - the start positions of the junks
lengths - the length of each junk
Returns:
the Sequence.CompositeSequence
See Also:
Sequence.CompositeSequence.Sequence.CompositeSequence(de.jstacs.data.sequences.Sequence, int[], int[])

getSubSequence

public final Sequence getSubSequence(AlphabetContainer abc,
                                     int start)
This method should be used if one wants to create a DataSet of subsequences of defined length. With this constructor you are enabled to create a DataSet where every Sequence has the same AlphabetContainer instance.

Internally it is checked that the AlphabetContainer matches with the one of the subsequence.

Parameters:
abc - the new AlphabetContainer
start - the index of the start position
Returns:
the subsequence
See Also:
getSubSequence(de.jstacs.data.AlphabetContainer, int, int)

getSubSequence

public Sequence getSubSequence(AlphabetContainer abc,
                               int start,
                               int length)
This method should be used if one wants to create a DataSet of subsequences of defined length. With this constructor you are enabled to create a DataSet where every Sequence has the same AlphabetContainer instance.

Internally it is checked that the AlphabetContainer matches with the one of the subsequence.

Parameters:
abc - the new AlphabetContainer
start - the index of the start position
length - the length of the new Sequence
Returns:
the subsequence
See Also:
SubSequence#SubSequence(de.jstacs.data.AlphabetContainer, de.jstacs.data.Sequence, int, int)

getSubSequence

public final Sequence getSubSequence(int start)
This is a very efficient way to create a subsequence/suffix for Sequences with a simple AlphabetContainer.

Parameters:
start - the index of the start position
Returns:
the subsequence
See Also:
getSubSequence(int, int)

getSubSequence

public Sequence getSubSequence(int start,
                               int length)
This is a very efficient way to create a subsequence of defined length for Sequences with a simple AlphabetContainer.

Parameters:
start - the index of the start position
length - the length of the new Sequence
Returns:
the subsequence
See Also:
SubSequence#SubSequence(Sequence, int, int)

annotate

public Sequence annotate(boolean add,
                         SequenceAnnotation... annotation)
This method allows to append annotation to a Sequence.

Parameters:
add - indicates whether to add the new annotation to the existing or not
annotation - the new annotation
Returns:
the new annotated Sequence
See Also:
flatCloneWithoutAnnotation()

flatCloneWithoutAnnotation

protected abstract Sequence flatCloneWithoutAnnotation()
Works in analogy to Object.clone(), but does not clone the annotation. This method is used in annotate(boolean, SequenceAnnotation...).

Returns:
the cloned Sequence without annotation

getLength

public abstract int getLength()
Returns the length of the Sequence.

Returns:
the length of the Sequence

toString

public String toString()
Returns a String representation of the Sequence (normally the Sequence in its original Alphabet).

Overrides:
toString in class Object
Returns:
the Sequence as String
See Also:
toString(String, int, int)

toString

public String toString(int start)
Returns a String representation of the Sequence (normally the Sequence in its original Alphabet) beginning at position start with a default delimiter as separator.

Parameters:
start - the start index (inclusive)
Returns:
the Sequence as String
See Also:
toString(String, int, int)

toString

public String toString(int start,
                       int end)
Returns a String representation of the Sequence (normally the Sequence in its original Alphabet) between start and end with a default delimiter as separator.

Parameters:
start - the start index (inclusive)
end - the end index (exclusive)
Returns:
the Sequence as String
See Also:
toString(String, int, int)

compareTo

public int compareTo(Sequence<T> s)
Specified by:
compareTo in interface Comparable<Sequence<T>>

compareTo

protected abstract int compareTo(T t1,
                                 T t2)
This method compares to container and is used in compareTo(Sequence).

Parameters:
t1 - the first container
t2 - the second container
Returns:
zero if arguments are equal
See Also:
getEmptyContainer(), fillContainer(Object, int), Comparable.compareTo(java.lang.Object)

toDiscrete

protected int toDiscrete(int pos,
                         double content)
This method converts a continuous value at position pos of the Sequence into a discrete one.

Parameters:
pos - the position of the Sequence
content - the value at this position
Returns:
the discrete value for this position
See Also:
AlphabetContainer.toDiscrete(int, double)

toString

public String toString(String delim,
                       int start,
                       int end)
Returns a String representation of the Sequence (normally the Sequence in its original alphabet) between start and end with delim as separator.

Parameters:
delim - the delimiter/separator
start - the start index (inclusive)
end - the end index (exclusive)
Returns:
the Sequence as String
See Also:
getEmptyRepresentation(), addToRepresentation(Object, int, String), getStringRepresentation(Object)

getEmptyRepresentation

protected abstract Object getEmptyRepresentation()
Returns an empty representation which is used to create the String representation of this instance in the method toString(String, int, int).

Returns:
an empty representation which is used to create the String representation
See Also:
toString(String, int, int)

addToRepresentation

protected abstract void addToRepresentation(Object representation,
                                            int pos,
                                            String delim)
This method adds the information of one position to the representation using the specified delimiter

Parameters:
representation - the representation
pos - the position
delim - the delimiter separating the information for different positions
See Also:
getEmptyRepresentation(), toString(String, int, int)

getStringRepresentation

protected abstract String getStringRepresentation(Object representation)
This method creates a String representation from the given representation.

Parameters:
representation - the representation instance (which should be created by getEmptyContainer() and filled by addToRepresentation(Object, int, String))
Returns:
a String representation
See Also:
getEmptyRepresentation(), addToRepresentation(Object, int, String), toString(String, int, int)

create

public static Sequence create(AlphabetContainer con,
                              String sequence)
                       throws WrongAlphabetException,
                              IllegalArgumentException
Creates a Sequence from a String based on the given AlphabetContainer using the standard delimiter for this AlphabetContainer.

Parameters:
con - the AlphabetContainer
sequence - the String containing the Sequence
Returns:
a new Sequence instance
Throws:
WrongAlphabetException - if sequence is not defined over con
IllegalArgumentException - if the delimiter is empty and the AlphabetContainer is not discrete
See Also:
create(AlphabetContainer, String, String)

create

public static Sequence create(AlphabetContainer con,
                              String sequence,
                              String delim)
                       throws WrongAlphabetException,
                              IllegalArgumentException
Creates a Sequence from a String based on the given AlphabetContainer using the given delimiter delim.

Parameters:
con - the AlphabetContainer
sequence - the String containing the Sequence
delim - the given delimiter
Returns:
a new Sequence instance
Throws:
WrongAlphabetException - if sequence is not defined over con
IllegalArgumentException - if the delimiter is empty and the AlphabetContainer is not discrete
See Also:
create(AlphabetContainer, SequenceAnnotation[], String, String)

create

public static Sequence create(AlphabetContainer con,
                              SequenceAnnotation[] annotation,
                              String sequence,
                              String delim)
                       throws WrongAlphabetException,
                              IllegalArgumentException
Creates a Sequence from a String based on the given AlphabetContainer using the given delimiter delim and some annotation for the Sequence.

Parameters:
con - the AlphabetContainer
annotation - the annotation for the Sequence
sequence - the String containing the Sequence
delim - the given delimiter
Returns:
a new Sequence instance
Throws:
WrongAlphabetException - if sequence is not defined over con
IllegalArgumentException - if the delimiter is empty and the AlphabetContainer is not discrete

reverse

public final Sequence reverse()
                       throws OperationNotSupportedException
This method returns a new instance of Sequence containing the reverse current Sequence.
So invoking this method, for instance, on the sequence "TAATA" returns "ATAAT".

Returns:
the reverse Sequence
Throws:
OperationNotSupportedException - if the current Sequence is based on an AlphabetContainer that is not simple
See Also:
reverse(int, int)

reverse

public Sequence reverse(int start,
                        int end)
                 throws OperationNotSupportedException
This method returns a new instance of Sequence containing a part of the reverse current Sequence.

Parameters:
start - the start position (inclusive) in the original Sequence
end - the end position (exclusive) in the original Sequence
Returns:
the reverse Sequence of the part
Throws:
OperationNotSupportedException - if the current Sequence is based on an AlphabetContainer that is not simple

complement

public Sequence complement()
                    throws OperationNotSupportedException
This method returns a new instance of Sequence containing the complementary current Sequence.
So invoking this method, for instance, on the sequence "TAATA" with an AlphabetContainer on DNAAlphabet returns "ATTAT".

Returns:
the complementary Sequence
Throws:
OperationNotSupportedException - if the current Sequence is not based on a ComplementableDiscreteAlphabet
See Also:
ComplementableDiscreteAlphabet, complement(int, int)

reverseComplement

public Sequence reverseComplement()
                           throws OperationNotSupportedException
This method returns a new instance of Sequence containing the reverse complementary current Sequence. For more details see the methods reverse() and complement().

Returns:
the reverse complementary Sequence
Throws:
OperationNotSupportedException - if the current Sequence is not discrete and simple (not based on a ComplementableDiscreteAlphabet)
See Also:
reverse(), complement(), reverseComplement(int, int), ComplementableDiscreteAlphabet

complement

public Sequence complement(int start,
                           int end)
                    throws OperationNotSupportedException
This method returns a new instance of Sequence containing a part of the complementary current Sequence.
So invoking this method, for instance, on the sequence "TAATA" with an AlphabetContainer on DNAAlphabet returns "ATTAT".

Parameters:
start - the start position (inclusive) in the original Sequence
end - the end position (exclusive) in the original Sequence
Returns:
the complementary Sequence of the part
Throws:
OperationNotSupportedException - if the current Sequence is not based on a ComplementableDiscreteAlphabet
See Also:
ComplementableDiscreteAlphabet

reverseComplement

public Sequence reverseComplement(int start,
                                  int end)
                           throws OperationNotSupportedException
This method returns a new instance of Sequence containing a reverse part of the complementary current Sequence. For more details see the methods reverse() and complement().

Parameters:
start - the start position (inclusive) in the original Sequence
end - the end position (exclusive) in the original Sequence
Returns:
the reverse complementary Sequence of the part
Throws:
OperationNotSupportedException - if the current Sequence is not discrete and simple ((not based on a ComplementableDiscreteAlphabet)
See Also:
reverse(), complement(), ComplementableDiscreteAlphabet

hashCode

public int hashCode()
Overrides:
hashCode in class Object

hashCodeForPos

protected abstract int hashCodeForPos(int pos)
This method is used in hashCode() and the hash code for one specific position.

Parameters:
pos - the position
Returns:
the hash code for the position

getHammingDistance

public int getHammingDistance(Sequence seq)
                       throws WrongAlphabetException
This method returns the Hamming distance between the current Sequence and seq. If the sequence have different length -1 is returned.

Parameters:
seq - the sequence to be compared
Returns:
the Hamming distance
Throws:
WrongAlphabetException - it the sequences have different AlphabetContainer

matches

public boolean matches(int maxHammingDistance,
                       Sequence shortSequence)
                throws WrongAlphabetException
This method allows to answer the question whether there is a similar pattern find a match with a given maximal number of mismatches.

Parameters:
maxHammingDistance - the maximal Hamming distance
shortSequence - the short sequence
Returns:
true if a match with maximal Hamming distance smaller than maxHammingDistance exists, otherwise false
Throws:
WrongAlphabetException - if the sequence have different AlphabetContainer

isMultiDimensional

public abstract boolean isMultiDimensional()
The method returns true if the sequence is multidimensional, otherwise .

Returns:
true if the sequence is multidimensional, otherwise

getEmptyContainer

public abstract T getEmptyContainer()
The method returns a container that can be used for accessing the symbols for each position. This is especially of interest for multidimensional sequences.

Returns:
a container that can be used for accessing the symbols for each position
See Also:
fillContainer(Object, int), isMultiDimensional()

fillContainer

public abstract void fillContainer(T container,
                                   int pos)
The method fills the content of a specific position in to the container. This is especially of interest for multidimensional sequences.

Parameters:
container - the container which is used for filling the content.
pos - the position
See Also:
getEmptyContainer(), isMultiDimensional()