de.jstacs.sequenceScores.statisticalModels.trainable
Class CompositeTrainSM

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.CompositeTrainSM
All Implemented Interfaces:
SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable

public class CompositeTrainSM
extends AbstractTrainableStatisticalModel

This class is for modelling sequences by modelling the different positions of the each sequence by different models. For instance one can use this class to model a sequence of subsequences, where each subsequence is modeled by a different model.

Author:
Jens Keilwagen

Field Summary
protected  int[][] lengths
          The length for each component.
protected  TrainableStatisticalModel[] models
          The models for the components
protected  int[][] starts
          The start indices.
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
CompositeTrainSM(AlphabetContainer alphabets, int[] assignment, TrainableStatisticalModel... models)
          Creates a new CompositeTrainSM.
CompositeTrainSM(StringBuffer stringBuff)
          The standard constructor for the interface Storable.
 
Method Summary
 CompositeTrainSM clone()
          Follows the conventions of Object's clone()-method.
 void fromXML(StringBuffer representation)
          This method should only be used by the constructor that works on a StringBuffer.
 ResultSet getCharacteristics()
          Returns some information characterizing or describing the current instance.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 int[] getLengthOfModels()
          This method returns the length of the models in the CompositeTrainSM.
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 TrainableStatisticalModel[] getModels()
          Returns the a deep copy of the models.
 int getNumberOfModels()
          This method returns the number of models in the CompositeTrainSM.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().
 boolean isInitialized()
          This method can be used to determine whether the instance is initialized.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 void train(DataSet data, double[] weights)
          Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
check, emitDataSet, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toString, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

models

protected TrainableStatisticalModel[] models
The models for the components


starts

protected int[][] starts
The start indices.


lengths

protected int[][] lengths
The length for each component.

Constructor Detail

CompositeTrainSM

public CompositeTrainSM(AlphabetContainer alphabets,
                        int[] assignment,
                        TrainableStatisticalModel... models)
                 throws WrongAlphabetException,
                        CloneNotSupportedException
Creates a new CompositeTrainSM.

Parameters:
alphabets - the alphabets used in the models
assignment - an array assigning each position to a model
models - the single models
Throws:
IllegalArgumentException - if something is wrong with the assignment of the models
WrongAlphabetException - if (parts of) the alphabet does not match with the alphabets of the models
CloneNotSupportedException - if the models could not be cloned

CompositeTrainSM

public CompositeTrainSM(StringBuffer stringBuff)
                 throws NonParsableException
The standard constructor for the interface Storable. Creates a new CompositeTrainSM out of a StringBuffer.

Parameters:
stringBuff - the StringBuffer to be parsed
Throws:
NonParsableException - if the StringBuffer is not parsable
Method Detail

clone

public CompositeTrainSM clone()
                       throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class AbstractTrainableStatisticalModel
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning

getCharacteristics

public ResultSet getCharacteristics()
                             throws Exception
Description copied from interface: SequenceScore
Returns some information characterizing or describing the current instance. This could be e.g. the number of edges for a Bayesian network or an image showing some representation of the instance. The set of characteristics should always include the XML-representation of the instance. The corresponding result type is StorableResult.

Specified by:
getCharacteristics in interface SequenceScore
Overrides:
getCharacteristics in class AbstractTrainableStatisticalModel
Returns:
the characteristics of the current instance
Throws:
Exception - if some of the characteristics could not be defined
See Also:
StorableResult

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Returns:
a short instance name

getLengthOfModels

public int[] getLengthOfModels()
This method returns the length of the models in the CompositeTrainSM.

Returns:
the length of the models as an array

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
Description copied from interface: StatisticalModel
This method returns the maximal used Markov order, if possible.

Specified by:
getMaximalMarkovOrder in interface StatisticalModel
Overrides:
getMaximalMarkovOrder in class AbstractTrainableStatisticalModel
Returns:
maximal used Markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from interface: SequenceScore
Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().

Returns:
the numerical characteristics of the current instance
Throws:
Exception - if some of the characteristics could not be defined

getModels

public TrainableStatisticalModel[] getModels()
                                      throws CloneNotSupportedException
Returns the a deep copy of the models.

Returns:
an array of AbstractTrainableStatisticalModels
Throws:
CloneNotSupportedException - if at least one of the models could not be cloned

getNumberOfModels

public int getNumberOfModels()
This method returns the number of models in the CompositeTrainSM.

Returns:
the number of models

getLogPriorTerm

public double getLogPriorTerm()
                       throws Exception
Description copied from interface: StatisticalModel
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos,
                            int endpos)
                     throws NotTrainedException,
                            Exception
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method StatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally, the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model

isInitialized

public boolean isInitialized()
Description copied from interface: SequenceScore
This method can be used to determine whether the instance is initialized. If the instance is initialized you should be able to invoke SequenceScore.getLogScoreFor(Sequence).

Returns:
true if the instance is initialized, false otherwise

fromXML

public void fromXML(StringBuffer representation)
             throws NonParsableException
Description copied from class: AbstractTrainableStatisticalModel
This method should only be used by the constructor that works on a StringBuffer. It is the counter part of Storable.toXML().

Specified by:
fromXML in class AbstractTrainableStatisticalModel
Parameters:
representation - the XML representation of the model
Throws:
NonParsableException - if the StringBuffer is not parsable or the representation is conflicting
See Also:
AbstractTrainableStatisticalModel.AbstractTrainableStatisticalModel(StringBuffer)

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Returns:
the XML representation

train

public void train(DataSet data,
                  double[] weights)
           throws Exception
Description copied from interface: TrainableStatisticalModel
Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the data set as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences as DataSet
weights - the weights of the elements, each weight should be non-negative
Throws:
Exception - if the training did not succeed (e.g. the dimension of weights and the number of sequences in the data set do not match)
See Also:
DataSet.getElementAt(int), DataSet.ElementEnumerator

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance