de.jstacs.data
Class AlphabetContainer

java.lang.Object
  extended by de.jstacs.data.AlphabetContainer
All Implemented Interfaces:
InstantiableFromParameterSet, Storable, Comparable<AlphabetContainer>
Direct Known Subclasses:
DNAAlphabetContainer

public class AlphabetContainer
extends Object
implements Storable, InstantiableFromParameterSet, Comparable<AlphabetContainer>

The container for Alphabets used in a Sequence, DataSet, AbstractTrainableStatisticalModel or ... . The container enables the user to have a different Alphabet at each position or at least not the same Alphabet at all positions. This is impossible if you use only instances of Alphabet. The container maps the given Alphabet objects to the positions.

AlphabetContainer is immutable.

Author:
Jens Keilwagen
See Also:
Alphabet

Nested Class Summary
static class AlphabetContainer.AbstractAlphabetContainerParameterSet<T extends AlphabetContainer>
          This class is the super class of any InstanceParameterSet for AlphabetContainer.
static class AlphabetContainer.AlphabetContainerType
          This enum defines types of AlphabetContainers.
 
Field Summary
protected  AlphabetContainer.AbstractAlphabetContainerParameterSet<?> parameters
          The parameters for this instance.
 
Constructor Summary
AlphabetContainer(Alphabet... abc)
          Creates a new AlphabetContainer with different Alphabets for each position.
AlphabetContainer(Alphabet abc)
          Creates a new simple AlphabetContainer.
AlphabetContainer(Alphabet[] abc, int[] assignment)
          Creates a new AlphabetContainer that uses different Alphabets.
AlphabetContainer(AlphabetContainer[] cons, int[] lengths)
          Creates an new sparse AlphabetContainer based on given AlphabetContainers.
AlphabetContainer(AlphabetContainerParameterSet parameters)
          Creates a new AlphabetContainer from an AlphabetContainerParameterSet that contains all necessary parameters.
AlphabetContainer(StringBuffer xml)
          The standard constructor for the interface Storable.
 
Method Summary
 boolean checkConsistency(AlphabetContainer abc)
          Checks if this AlphabetContainer is consistent consistent with another AlphabetContainer.
 int compareTo(AlphabetContainer abc)
           
 Alphabet getAlphabetAt(int pos)
          Returns the underlying Alphabet of position pos.
 int getAlphabetIndexForPosition(int pos)
          This method returns the index of the Alphabet that is used for the given position.
 double getAlphabetLengthAt(int pos)
          Returns the length of the underlying Alphabet of position pos.
 double getCode(int pos, String sym)
          Returns the encoded symbol for sym of the Alphabet of position pos of this AlphabetContainer.
 AlphabetContainer getCompositeContainer(int[] start, int[] length)
          Returns an AlphabetContainer of Alphabets e.g.
 AlphabetContainer.AbstractAlphabetContainerParameterSet<? extends AlphabetContainer> getCurrentParameterSet()
          Returns the InstanceParameterSet that has been used to instantiate the current instance of the implementing class.
 String getDelim()
          Returns the delimiter that should be used (for writing e.g.
 int[] getIndexForAlphabets()
          This method returns an object that is used for assigning the positions of the Sequences to specific Alphabets.
 double getMaximalAlphabetLength()
          Returns the maximal Alphabet length of this AlphabetContainer.
 double getMin(int pos)
          Returns the minimal value of the underlying Alphabet of position pos.
 double getMinimalAlphabetLength()
          Returns the minimal Alphabet length of this AlphabetContainer.
 int getNumberOfAlphabets()
          This method returns the number of Alphabets used in the current AlphabetContainer.
 int getPossibleLength()
          Returns the possible length for Sequences using this AlphabetContainer.
static AlphabetContainer getSimplifiedAlphabetContainer(Alphabet[] abc, int[] assignment)
          This method creates a new AlphabetContainer that uses as less as possible Alphabets to describe the container.
 AlphabetContainer getSubContainer(int start, int length)
          Returns a sub-container with the Alphabets for the positions starting at start and with length length.
 String getSymbol(int pos, double val)
          Returns a String representation of the encoded symbol val of the Alphabet of position pos of this AlphabetContainer.
 AlphabetContainer.AlphabetContainerType getType()
          Returns the type of this AlphabetContainer.
 boolean ignoresCase()
          Indicates if all used Alphabets ignore the case.
static AlphabetContainer insertAlphabet(AlphabetContainer aC, Alphabet a, boolean[] useNewAlphabet)
          This method may be used to construct a new AlphabetContainer by incorporating additional Alphabets into an existing AlphabetContainer.
 boolean isDiscrete()
          Indicates if all positions use discrete Alphabets.
 boolean isDiscreteAt(int pos)
          Indicates if position pos is a discrete random variable, i.e.
 boolean isEncodedSymbol(int pos, double continuous)
          Indicates if continuous is a symbol of the Alphabet used at position pos of the AlphabetContainer.
 boolean isReverseComplementable()
          This method helps to determine if the AlphabetContainer also computes the reverse complement of a Sequence.
 boolean isSimple()
          Indicates whether all random variables are defined over the same range, i.e.
 int toDiscrete(int pos, double val)
          Returns the discrete value for val of the Alphabet of position pos in the AlphabetContainer.
 String toString()
           
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

parameters

protected AlphabetContainer.AbstractAlphabetContainerParameterSet<?> parameters
The parameters for this instance.

Constructor Detail

AlphabetContainer

public AlphabetContainer(Alphabet abc)
Creates a new simple AlphabetContainer. All positions use the same Alphabet and therefore sequences of arbitrary length can be handled.

Parameters:
abc - the Alphabet for all positions

AlphabetContainer

public AlphabetContainer(Alphabet... abc)
Creates a new AlphabetContainer with different Alphabets for each position. The assignment of the Alphabets to the positions is given by the order in the Alphabet array. This constructor should only be used if all Alphabets are pairwise different.

Parameters:
abc - the different Alphabets for each position
See Also:
AlphabetContainer(Alphabet[], int[])

AlphabetContainer

public AlphabetContainer(AlphabetContainer[] cons,
                         int[] lengths)
                  throws IllegalArgumentException
Creates an new sparse AlphabetContainer based on given AlphabetContainers.

Parameters:
cons - the given AlphabetContainers
lengths - the corresponding lengths of each AlphabetContainer that is used
Throws:
IllegalArgumentException - if the given length for an AlphabetContainer is not possible

AlphabetContainer

public AlphabetContainer(Alphabet[] abc,
                         int[] assignment)
                  throws IllegalArgumentException
Creates a new AlphabetContainer that uses different Alphabets. The Alphabets can be used more than once. The assignment for the Alphabets to the positions is given by the array assignment.

Parameters:
abc - the Alphabets
assignment - the assignment array
Throws:
IllegalArgumentException - if the assignment of the Alphabets to the positions is not correct

AlphabetContainer

public AlphabetContainer(AlphabetContainerParameterSet parameters)
                  throws IllegalArgumentException,
                         DoubleSymbolException,
                         ParameterSetParser.NotInstantiableException
Creates a new AlphabetContainer from an AlphabetContainerParameterSet that contains all necessary parameters.

Parameters:
parameters - the parameter set
Throws:
IllegalArgumentException - if something is wrong with the parameters in the AlphabetContainerParameterSet
DoubleSymbolException - if the definitions within parameters contains a symbol twice
ParameterSetParser.NotInstantiableException - if an instance could not be created

AlphabetContainer

public AlphabetContainer(StringBuffer xml)
                  throws NonParsableException
The standard constructor for the interface Storable. Creates a new AlphabetContainer out of its XML representation.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the AlphabetContainer could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable
Method Detail

getSimplifiedAlphabetContainer

public static AlphabetContainer getSimplifiedAlphabetContainer(Alphabet[] abc,
                                                               int[] assignment)
This method creates a new AlphabetContainer that uses as less as possible Alphabets to describe the container. So, if possible, Alphabets will be reused.

Parameters:
abc - the Alphabets
assignment - the assignment of the Alphabets to the positions
Returns:
an AlphabetContainer that uses as less as possible Alphabets
See Also:
AlphabetContainer(Alphabet[], int[])

insertAlphabet

public static AlphabetContainer insertAlphabet(AlphabetContainer aC,
                                               Alphabet a,
                                               boolean[] useNewAlphabet)
                                        throws IllegalArgumentException
This method may be used to construct a new AlphabetContainer by incorporating additional Alphabets into an existing AlphabetContainer.

Parameters:
aC - the AlphabetContainer used as template for the returned AlphabetContainer
a - the Alphabet that should be inserted
useNewAlphabet - an array to define, which Alphabets are used for which positions
Incorporation of the additional Alphabet must be understood as defining new additional positions for which the given Alphabet should be used. For example: let the given AlphabetContainer contain three Alphabet s A0,A1,A2 (Ai for position i) and therefore have a possible length of three. Calling this method using this AlphabetContainer, an additional Alphabet A3 and an assignment array of [false, true, true, false, false] returns a new AlphabetContainer having a possible length of five and using the following alphabets for those positions: A0,A3,A3,A1,A2. If the given AlphabetContainer has a possible length not equal zero, then the assignment array must contain as many false-values as the length of the given AlphabetContainer.
Returns:
a new AlphabetContainer as described above
Throws:
IllegalArgumentException - if useNewAlphabet is null or has length 0
See Also:
AlphabetContainer(Alphabet[], int[])

checkConsistency

public boolean checkConsistency(AlphabetContainer abc)
Checks if this AlphabetContainer is consistent consistent with another AlphabetContainer.

Parameters:
abc - the second AlphabetContainer
Returns:
true if the AlphabetContainers are consistent, false otherwise
See Also:
Comparable.compareTo(Object)

compareTo

public int compareTo(AlphabetContainer abc)
Specified by:
compareTo in interface Comparable<AlphabetContainer>

getAlphabetAt

public Alphabet getAlphabetAt(int pos)
Returns the underlying Alphabet of position pos. Please note that the Alphabet is returned as reference, so take care of what you are doing with it!

Parameters:
pos - the position
Returns:
the Alphabet of the given position

getAlphabetLengthAt

public double getAlphabetLengthAt(int pos)
Returns the length of the underlying Alphabet of position pos.

Parameters:
pos - the position
Returns:
the length of the underlying Alphabet of position pos

getCode

public double getCode(int pos,
                      String sym)
               throws WrongAlphabetException
Returns the encoded symbol for sym of the Alphabet of position pos of this AlphabetContainer.

Parameters:
pos - the position of the Alphabet
sym - the symbol that should be returned encoded
Returns:
the encoded symbol
Throws:
WrongAlphabetException - if the symbol is not defined in the Alphabet of the given position
See Also:
getAlphabetAt(int)

getCompositeContainer

public AlphabetContainer getCompositeContainer(int[] start,
                                               int[] length)
Returns an AlphabetContainer of Alphabets e.g. for composite motifs/sequences.

Parameters:
start - the array of start indices
length - the array of lengths
Returns:
the AlphabetContainer
See Also:
getSubContainer(int, int)

getCurrentParameterSet

public AlphabetContainer.AbstractAlphabetContainerParameterSet<? extends AlphabetContainer> getCurrentParameterSet()
                                                                                                            throws Exception
Description copied from interface: InstantiableFromParameterSet
Returns the InstanceParameterSet that has been used to instantiate the current instance of the implementing class. If the current instance was not created using an InstanceParameterSet, an equivalent InstanceParameterSet should be returned, so that an instance created using this InstanceParameterSet would be in principle equal to the current instance.

Specified by:
getCurrentParameterSet in interface InstantiableFromParameterSet
Returns:
the current InstanceParameterSet
Throws:
Exception - if the InstanceParameterSet could not be returned

getDelim

public String getDelim()
Returns the delimiter that should be used (for writing e.g. a sequence).

Returns:
the delimiter

getMaximalAlphabetLength

public double getMaximalAlphabetLength()
Returns the maximal Alphabet length of this AlphabetContainer.

Returns:
the maximal Alphabet length

getMin

public double getMin(int pos)
Returns the minimal value of the underlying Alphabet of position pos.

Parameters:
pos - the given position
Returns:
the minimal value of the Alphabet of the given position
See Also:
Alphabet.getMin()

getMinimalAlphabetLength

public double getMinimalAlphabetLength()
Returns the minimal Alphabet length of this AlphabetContainer.

Returns:
the minimal Alphabet length of this AlphabetContainer

getPossibleLength

public int getPossibleLength()
Returns the possible length for Sequences using this AlphabetContainer. If 0 (zero) is returned, all lengths are possible.

Returns:
the possible length using this AlphabetContainer

getSubContainer

public AlphabetContainer getSubContainer(int start,
                                         int length)
Returns a sub-container with the Alphabets for the positions starting at start and with length length. The method can be used for subsequences, ... .

Parameters:
start - the index of the start position
length - the length
Returns:
the sub-container of Alphabets
See Also:
getCompositeContainer(int[], int[])

getSymbol

public String getSymbol(int pos,
                        double val)
Returns a String representation of the encoded symbol val of the Alphabet of position pos of this AlphabetContainer.

Parameters:
pos - the position of the Alphabet
val - the value of the encoded symbol
Returns:
a String representation for the encoded symbol val of the Alphabet of position pos

ignoresCase

public final boolean ignoresCase()
Indicates if all used Alphabets ignore the case.

Returns:
true if all used alphabets ignore the case, false otherwise

isDiscrete

public final boolean isDiscrete()
Indicates if all positions use discrete Alphabets.

Returns:
true if all positions use discrete Alphabets, otherwise false

isDiscreteAt

public boolean isDiscreteAt(int pos)
Indicates if position pos is a discrete random variable, i.e. if the Alphabet of position pos is discrete.

Parameters:
pos - the position
Returns:
true if position pos is a discrete random variable, false otherwise

isEncodedSymbol

public boolean isEncodedSymbol(int pos,
                               double continuous)
Indicates if continuous is a symbol of the Alphabet used at position pos of the AlphabetContainer.

Parameters:
pos - the position
continuous - the continuous value
Returns:
true if continuous is a symbol of the Alphabet used in position pos, false otherwise

isSimple

public final boolean isSimple()
Indicates whether all random variables are defined over the same range, i.e. if the AlphabetContainer is simple and all positions use the same (fixed) Alphabet.

Returns:
whether all random variables are defined over the same range, i.e. if the AlphabetContainer is simple

isReverseComplementable

public final boolean isReverseComplementable()
This method helps to determine if the AlphabetContainer also computes the reverse complement of a Sequence.

Returns:
true if the AlphabetContainer also computes the reverse complement of a Sequence, false otherwise

toDiscrete

public int toDiscrete(int pos,
                      double val)
Returns the discrete value for val of the Alphabet of position pos in the AlphabetContainer.

Parameters:
pos - the position
val - the value
Returns:
the discrete value for val of the Alphabet of position pos
See Also:
isDiscreteAt(int)

toString

public String toString()
Overrides:
toString in class Object

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Specified by:
toXML in interface Storable
Returns:
the XML representation

getType

public final AlphabetContainer.AlphabetContainerType getType()
Returns the type of this AlphabetContainer.

Returns:
the type
See Also:
AlphabetContainer.AlphabetContainerType

getAlphabetIndexForPosition

public int getAlphabetIndexForPosition(int pos)
This method returns the index of the Alphabet that is used for the given position.

Parameters:
pos - the position
Returns:
the index of the used Alphabet

getNumberOfAlphabets

public int getNumberOfAlphabets()
This method returns the number of Alphabets used in the current AlphabetContainer.

Returns:
the number of used Alphabets

getIndexForAlphabets

public int[] getIndexForAlphabets()
This method returns an object that is used for assigning the positions of the Sequences to specific Alphabets.

Returns:
null or an int array