de.jstacs.data
Class AlphabetContainer

java.lang.Object
  extended by de.jstacs.data.AlphabetContainer
All Implemented Interfaces:
InstantiableFromParameterSet, Storable, Comparable<AlphabetContainer>

public class AlphabetContainer
extends Object
implements Storable, InstantiableFromParameterSet, Comparable<AlphabetContainer>

The container for some alphabets used in a sequence, sample, model or ... . The container enables the user to have at each position a different alphabet or at least not the same alphabet at all position. This is impossible if you use only instances of Alphabet. The container maps the given Alphabet objects to the positions.

AlphabetContainer is immutable.

Author:
Jens Keilwagen
See Also:
Alphabet

Constructor Summary
AlphabetContainer(Alphabet abc)
          This constructor creates a simple AlphabetContainer.
AlphabetContainer(Alphabet[] abc)
          This constructor creates an AlphabetContainer with different alphabets for each position.
AlphabetContainer(Alphabet[] abc, int[] assignment)
          This constructor creates an AlphabetContainer that uses different alphabets.
AlphabetContainer(AlphabetContainer[] cons, int[] lengths)
          This constructor creates an new sparse AlphabetContainer based on given AlphabetContainers.
AlphabetContainer(AlphabetContainerParameterSet parameters)
          Creates a new AlphabetContainer from an AlphabetContainerParameterSet that contains all necessary parameters
AlphabetContainer(StringBuffer xml)
          Extracts the AlphabetContainer from the StringBuffer.
 
Method Summary
 boolean checkConsistency(AlphabetContainer abc)
          Checks whether to alphabets are consistent.
 int compareTo(AlphabetContainer abc)
           
 Alphabet getAlphabetAt(int pos)
          Returns the underlying alphabet of position pos.
 double getAlphabetLengthAt(int pos)
          Returns the length of the underlying alphabet of position pos.
 double getCode(int pos, String sym)
          Returns the encoded symbol sym for position pos.
 AlphabetContainer getCompositeContainer(int[] start, int[] length)
          This method returns a container of alphabets e.g. for composite motifs/sequences.
 AlphabetContainerParameterSet getCurrentParameterSet()
          Returns the ParameterSet that has been used to instantiate the current instance of the implementing class.
 String getDelim()
          Returns the delimiter that should be used (for writing e.g. a sequence).
 double getMaximalAlphabetLength()
          Returns the maximal alphabet length of this container.
 double getMin(int pos)
          Returns the min of the underlying alphabet of position pos.
 double getMinimalAlphabetLength()
          Returns the minimal alphabet length of this container.
 int getPossibleLength()
          Returns the possible length for sequences (, ...) using this container.
static AlphabetContainer getSimplifiedAlphabetContainer(Alphabet[] abc, int[] assignment)
          This method creates a new AlphabetContainer that used as less as possible alphabets to describe the container.
 AlphabetContainer getSubContainer(int start, int length)
          This method returns a subcontainer for the positions starting at start and with length length.
 String getSymbol(int pos, double val)
          This method returns a String repsresentation of val
 boolean hasEveryWhereSameAlphabetSize()
          Returns true if all used alphabets have the same alphabet size, otherwise false.
 boolean ignoresCase()
          If this method returns true all used alphabets ignore the case.
static AlphabetContainer insertAlphabet(AlphabetContainer aC, Alphabet a, boolean[] useNewAlphabet)
          This method may be used to construct a new AlphabetContainer by incorporating additional alphabets into an exsisting AlphabetContainer.
 boolean isDiscrete()
          If this method returns true all postions use discrete values.
 boolean isDiscreteAt(int pos)
          Returns true if position pos is a discrete random variable.
 boolean isEncodedSymbol(int pos, double continuous)
          Returns true if continuous is a symbol of the alphabet used in position pos.
 boolean isReverseComplementable()
          This method helps to determine if the AlphabectContainer also to compute the reverse complement of a sequence.
 boolean isSimple()
          This method answers the question whether all random variable are defined over the same range, i.e. all positions use the same (fixed) alphabet.
 int toDiscrete(int pos, double val)
           
 String toString()
           
 StringBuffer toXML()
          This method returns an XML-representation of an instance of the implementing class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

AlphabetContainer

public AlphabetContainer(Alphabet abc)
This constructor creates a simple AlphabetContainer. All positions use the same alphabet and therefore sequences of arbitray length can be handled.

Parameters:
abc - the alphabet

AlphabetContainer

public AlphabetContainer(Alphabet[] abc)
This constructor creates an AlphabetContainer with different alphabets for each position. The assignment from the alphabets to the positions is given by the order in the alphabet array. This constructor should only be used if all alphabets are pairwise different.

Parameters:
abc - the alphabets
See Also:
AlphabetContainer(Alphabet[], int[])

AlphabetContainer

public AlphabetContainer(AlphabetContainer[] cons,
                         int[] lengths)
                  throws IllegalArgumentException
This constructor creates an new sparse AlphabetContainer based on given AlphabetContainers.

Parameters:
cons - the AlphabetContainers
lengths - the corresponding lengths each AlphabetContainer is used
Throws:
IllegalArgumentException - if the given length for an AlphabetContainer is not possible

AlphabetContainer

public AlphabetContainer(Alphabet[] abc,
                         int[] assignment)
                  throws IllegalArgumentException
This constructor creates an AlphabetContainer that uses different alphabets. The alphabets can be used more than once. The assignment for the alphabets to the positions is given by the assignment array.

Parameters:
abc - the alphabets
assignment - the assignment array
Throws:
IllegalArgumentException - if the assignment from the alphabets to the positions is not correct

AlphabetContainer

public AlphabetContainer(AlphabetContainerParameterSet parameters)
                  throws IllegalArgumentException,
                         DoubleSymbolException,
                         ParameterSetParser.NotInstantiableException
Creates a new AlphabetContainer from an AlphabetContainerParameterSet that contains all necessary parameters

Parameters:
parameters - the parameters
Throws:
IllegalArgumentException - is thrown if the AlphabetContainerParameterSet is not cloneable
DoubleSymbolException - is thrown if the definitions within parameters contains a doublette symbol
ParameterSetParser.NotInstantiableException

AlphabetContainer

public AlphabetContainer(StringBuffer xml)
                  throws NonParsableException
Extracts the AlphabetContainer from the StringBuffer.

Parameters:
xml - the XML stream
Throws:
NonParsableException - if the stream is not parsable
Method Detail

getSimplifiedAlphabetContainer

public static AlphabetContainer getSimplifiedAlphabetContainer(Alphabet[] abc,
                                                               int[] assignment)
This method creates a new AlphabetContainer that used as less as possible alphabets to describe the container. So if possible alphabets will be reused.

Parameters:
abc - the alphabets
assignment - the assigment of the alphabets to the positions
Returns:
a AlphabetbetConatiner that uses as less as possible alphabets

insertAlphabet

public static AlphabetContainer insertAlphabet(AlphabetContainer aC,
                                               Alphabet a,
                                               boolean[] useNewAlphabet)
                                        throws IllegalArgumentException
This method may be used to construct a new AlphabetContainer by incorporating additional alphabets into an exsisting AlphabetContainer.

Parameters:
aC - the AlphabetContainer used as template for the returned AlphabetContainer
a - this Alphabet should be inserted
useNewAlphabet - This array is used to define, which Alphabets are used for which positions.

  • the length of this array defines the length of Sequence the returned AlphabetContainer is capable to handle
  • for each position false forces to use the alphabet given by the given AlphabetContainer
  • for each position true forces to used the given Alphabet


      Incorporation the additional alphabet must be understand as defining new additional positions to which the given alphabet should be used. For example: let the given AlphabetConatainer contain three alphabets A0,A1,A2 (Ai for position i) and have a possible length of three. Calling this method using this AlphabetContainer, an additional Alphabet A3 and an assignment-array of false,true,true,false,false returns a new AlphabetContainer having a possible length of five and using the following alphabets for those position: A0,A3,A3,A1,A2. If the given AlphabetContainer has a possible length not equal zero, then the assignment-array must contain as many false-values as the length of the given AlphabetContainer.
      Returns:
      a new AlphabetContainer as described above.
      Throws:
      IllegalArgumentException - if useNewAlphabet is null or has length 0

checkConsistency

public boolean checkConsistency(AlphabetContainer abc)
Checks whether to alphabets are consistent.

Parameters:
abc - the second alphabet
Returns:
whether to alphabets are consistent

compareTo

public int compareTo(AlphabetContainer abc)
Specified by:
compareTo in interface Comparable<AlphabetContainer>

getAlphabetAt

public Alphabet getAlphabetAt(int pos)
Returns the underlying alphabet of position pos. Please note that the alphabet is returned as reference so take care of what you are doing with it!

Parameters:
pos - the position
Returns:
the alphabet

getAlphabetLengthAt

public double getAlphabetLengthAt(int pos)
Returns the length of the underlying alphabet of position pos.

Parameters:
pos - the position
Returns:
the length of the underlying alphabet of position

getCode

public double getCode(int pos,
                      String sym)
               throws WrongAlphabetException
Returns the encoded symbol sym for position pos.

Parameters:
pos - the position
sym - the symbol
Returns:
the encoded symbol
Throws:
WrongAlphabetException

getCompositeContainer

public AlphabetContainer getCompositeContainer(int[] start,
                                               int[] length)
This method returns a container of alphabets e.g. for composite motifs/sequences.

Parameters:
start - the array of start indices
length - the array of lengths
Returns:
the container
See Also:
getSubContainer(int, int)

getCurrentParameterSet

public AlphabetContainerParameterSet getCurrentParameterSet()
                                                     throws Exception
Description copied from interface: InstantiableFromParameterSet
Returns the ParameterSet that has been used to instantiate the current instance of the implementing class. If the current instance was not created using a ParameterSet, an equivalent ParameterSet should be returned, such that an instance created using this ParameterSet would be in principle equal to the current instance.

Specified by:
getCurrentParameterSet in interface InstantiableFromParameterSet
Returns:
the current ParameterSet
Throws:
Exception - is thrown if the ParameterSet could not be returned

getDelim

public String getDelim()
Returns the delimiter that should be used (for writing e.g. a sequence).

Returns:
the delimiter

getMaximalAlphabetLength

public double getMaximalAlphabetLength()
Returns the maximal alphabet length of this container.

Returns:
the maximal alphabet length of this container

getMin

public double getMin(int pos)
Returns the min of the underlying alphabet of position pos.

Parameters:
pos - the position
Returns:
the minimal value

getMinimalAlphabetLength

public double getMinimalAlphabetLength()
Returns the minimal alphabet length of this container.

Returns:
the minimal alphabet length of this container

getPossibleLength

public int getPossibleLength()
Returns the possible length for sequences (, ...) using this container. If 0 (zero) is returned, all lengths are possible.

Returns:
the possible length using this container

getSubContainer

public AlphabetContainer getSubContainer(int start,
                                         int length)
This method returns a subcontainer for the positions starting at start and with length length. The method can be used for subsequences, ... .

Parameters:
start - the index of the start position
length - the length
Returns:
the subcontainer of alphabets
See Also:
getCompositeContainer(int[], int[])

getSymbol

public String getSymbol(int pos,
                        double val)
This method returns a String repsresentation of val

Parameters:
pos - the position
val - the value
Returns:
a string representation for val at position pos

hasEveryWhereSameAlphabetSize

public final boolean hasEveryWhereSameAlphabetSize()
Returns true if all used alphabets have the same alphabet size, otherwise false. I.e. it returns getMaximalAlphabetLength() == getMinimalAlphabetLength()

Returns:
true if all used alphabets have the same alphabet size
See Also:
getMaximalAlphabetLength(), getMinimalAlphabetLength()

ignoresCase

public final boolean ignoresCase()
If this method returns true all used alphabets ignore the case.

Returns:
true if all used alphabets ignore the case

isDiscrete

public final boolean isDiscrete()
If this method returns true all postions use discrete values.

Returns:
true if all postions use discrete values

isDiscreteAt

public boolean isDiscreteAt(int pos)
Returns true if position pos is a discrete random variable.

Parameters:
pos - the position
Returns:
true if position pos is a discrete random variable

isEncodedSymbol

public boolean isEncodedSymbol(int pos,
                               double continuous)
Returns true if continuous is a symbol of the alphabet used in position pos.

Parameters:
pos - the position
continuous - the continuous value
Returns:
true if continuous is a symbol of the alphabet used in position pos

isSimple

public final boolean isSimple()
This method answers the question whether all random variable are defined over the same range, i.e. all positions use the same (fixed) alphabet.

Returns:
whether all random variable are defined over the same range

isReverseComplementable

public final boolean isReverseComplementable()
This method helps to determine if the AlphabectContainer also to compute the reverse complement of a sequence.

Returns:
true if the AlphabectContainer also to compute the reverse complement of a sequence.

toDiscrete

public int toDiscrete(int pos,
                      double val)
Parameters:
pos - the position
val - the value
Returns:
a discrete value for val at position pos

toString

public String toString()
Overrides:
toString in class Object

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML-representation of an instance of the implementing class.

Specified by:
toXML in interface Storable
Returns:
the XML-representation