de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous
Class HomogeneousTrainSM

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
          extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous.HomogeneousTrainSM
All Implemented Interfaces:
InstantiableFromParameterSet, SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable
Direct Known Subclasses:
HomogeneousMM

public abstract class HomogeneousTrainSM
extends DiscreteGraphicalTrainSM

This class implements homogeneous models of arbitrary order.

Author:
Jens Keilwagen
See Also:
HomogeneousTrainSMParameterSet

Nested Class Summary
protected  class HomogeneousTrainSM.HomCondProb
          This class handles the (conditional) probabilities of a homogeneous model in a fast way.
 
Field Summary
protected  byte order
          The order of the model.
protected  int[] powers
          The powers of the alphabet length.
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
params, trained
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
HomogeneousTrainSM(HomogeneousTrainSMParameterSet params)
          Creates a homogeneous model from a parameter set.
HomogeneousTrainSM(StringBuffer stringBuff)
          The standard constructor for the interface Storable.
 
Method Summary
protected  void check(Sequence sequence, int startpos, int endpos)
          Checks some constraints, these are in general conditions on the AlphabetContainer of a (sub)Sequence between startpos und endpos.
protected  int chooseFromDistr(Constraint distr, int start, int end, double randNo)
          Chooses a value in [0,end-start] according to the distribution encoded in the frequencies of distr between the indices start and end.
protected  HomogeneousTrainSM.HomCondProb[] cloneHomProb(HomogeneousTrainSM.HomCondProb[] p)
          Clones the given array of conditional probabilities.
 DataSet emitDataSet(int no, int... length)
          Creates a DataSet of a given number of Sequences from a trained homogeneous model.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().
protected abstract  Sequence getRandomSequence(Random r, int length)
          This method creates a random Sequence from a trained homogeneous model.
protected abstract  double logProbFor(Sequence sequence, int startpos, int endpos)
          This method computes the logarithm of the probability of the given Sequence in the given interval.
protected  void set(DGTrainSMParameterSet params, boolean trained)
          Sets the parameters as internal parameters and does some essential computations.
 void train(DataSet[] data)
          Trains the homogeneous model on all given DataSets.
abstract  void train(DataSet[] data, double[][] weights)
          Trains the homogeneous model using an array of weighted DataSets.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
clone, fromXML, getCurrentParameterSet, getDescription, getESS, getFurtherModelInfos, getXMLTag, isInitialized, setFurtherModelInfos, toString, toXML
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toString, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel
train
 
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.StatisticalModel
getLogPriorTerm
 
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore
getInstanceName
 

Field Detail

powers

protected int[] powers
The powers of the alphabet length.


order

protected byte order
The order of the model.

Constructor Detail

HomogeneousTrainSM

public HomogeneousTrainSM(HomogeneousTrainSMParameterSet params)
                   throws CloneNotSupportedException,
                          IllegalArgumentException,
                          NonParsableException
Creates a homogeneous model from a parameter set.

Parameters:
params - the parameter set
Throws:
CloneNotSupportedException - if the parameter set could not be cloned
IllegalArgumentException - if the parameter set is not instantiated
NonParsableException - if the parameter set is not parsable
See Also:
HomogeneousTrainSMParameterSet, DiscreteGraphicalTrainSM.DiscreteGraphicalTrainSM(DGTrainSMParameterSet)

HomogeneousTrainSM

public HomogeneousTrainSM(StringBuffer stringBuff)
                   throws NonParsableException
The standard constructor for the interface Storable. Creates a new HomogeneousTrainSM out of its XML representation.

Parameters:
stringBuff - the XML representation as StringBuffer
Throws:
NonParsableException - if the HomogeneousTrainSM could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable, DiscreteGraphicalTrainSM.DiscreteGraphicalTrainSM(StringBuffer)
Method Detail

emitDataSet

public final DataSet emitDataSet(int no,
                                 int... length)
                          throws NotTrainedException,
                                 IllegalArgumentException,
                                 EmptyDataSetException,
                                 WrongAlphabetException,
                                 WrongSequenceTypeException
Creates a DataSet of a given number of Sequences from a trained homogeneous model.

Specified by:
emitDataSet in interface StatisticalModel
Overrides:
emitDataSet in class AbstractTrainableStatisticalModel
Parameters:
no - the number of Sequences that should be in the DataSet
length - the length of all Sequences or an array of lengths with the Sequence with index i having length length[i]
Returns:
the created DataSet
Throws:
NotTrainedException - if the model was not trained
IllegalArgumentException - if the dimension of length is neither 1 nor no
EmptyDataSetException - if no == 0
WrongSequenceTypeException - if the Sequence type is not suitable (for the AlphabetContainer)
WrongAlphabetException - if something is wrong with the alphabet
See Also:
DataSet.DataSet(String, Sequence...)

getRandomSequence

protected abstract Sequence getRandomSequence(Random r,
                                              int length)
                                       throws WrongAlphabetException,
                                              WrongSequenceTypeException
This method creates a random Sequence from a trained homogeneous model.

Parameters:
r - the random generator
length - the length of the Sequence
Returns:
the created Sequence
Throws:
WrongSequenceTypeException - if the Sequence type is not suitable (for the AlphabetContainer)
WrongAlphabetException - if something is wrong with the alphabet

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
Description copied from interface: StatisticalModel
This method returns the maximal used Markov order, if possible.

Specified by:
getMaximalMarkovOrder in interface StatisticalModel
Overrides:
getMaximalMarkovOrder in class AbstractTrainableStatisticalModel
Returns:
maximal used Markov order

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from interface: SequenceScore
Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().

Returns:
the numerical characteristics of the current instance
Throws:
Exception - if some of the characteristics could not be defined

getLogProbFor

public final double getLogProbFor(Sequence sequence,
                                  int startpos,
                                  int endpos)
                           throws NotTrainedException,
                                  Exception
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method StatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally, the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model

train

public void train(DataSet[] data)
           throws Exception
Trains the homogeneous model on all given DataSets.

Parameters:
data - the given DataSets
Throws:
Exception - if something went wrong
See Also:
train(DataSet[], double[][])

train

public abstract void train(DataSet[] data,
                           double[][] weights)
                    throws Exception
Trains the homogeneous model using an array of weighted DataSets. The Sequence weights in weights[i] are for the DataSet in data[i].

Parameters:
data - the given DataSets
weights - the weights
Throws:
Exception - if something went wrong, furthermore data.length has to be weights.length

set

protected void set(DGTrainSMParameterSet params,
                   boolean trained)
            throws CloneNotSupportedException,
                   NonParsableException
Description copied from class: DiscreteGraphicalTrainSM
Sets the parameters as internal parameters and does some essential computations. Used in fromParameterSet-methods.

Overrides:
set in class DiscreteGraphicalTrainSM
Parameters:
params - the new ParameterSet
trained - indicates if the model is trained or not
Throws:
CloneNotSupportedException - if the parameter set could not be cloned
NonParsableException - if the parameters of the model could not be parsed

check

protected void check(Sequence sequence,
                     int startpos,
                     int endpos)
              throws NotTrainedException,
                     IllegalArgumentException
Checks some constraints, these are in general conditions on the AlphabetContainer of a (sub)Sequence between startpos und endpos.

Overrides:
check in class DiscreteGraphicalTrainSM
Parameters:
sequence - the Sequence
startpos - the start position within the Sequence
endpos - the end position within the Sequence
Throws:
NotTrainedException - if the model is not trained
IllegalArgumentException - if some arguments are wrong
See Also:
DiscreteGraphicalTrainSM.check(Sequence, int, int)

chooseFromDistr

protected final int chooseFromDistr(Constraint distr,
                                    int start,
                                    int end,
                                    double randNo)
Chooses a value in [0,end-start] according to the distribution encoded in the frequencies of distr between the indices start and end.

The instance distr is not changed in the process.

Parameters:
distr - the distribution
start - the start index
end - the end index
randNo - a random number in [0,1]
Returns:
the chosen value
See Also:
Constraint.getFreq(int)

logProbFor

protected abstract double logProbFor(Sequence sequence,
                                     int startpos,
                                     int endpos)
This method computes the logarithm of the probability of the given Sequence in the given interval. The method is only used in StatisticalModel.getLogProbFor(Sequence, int, int) after the method check(Sequence, int, int) has been invoked.

Parameters:
sequence - the Sequence
startpos - the start position within the Sequence
endpos - the end position within the Sequence
Returns:
the logarithm of the probability for the given subsequence
See Also:
check(Sequence, int, int), StatisticalModel.getLogProbFor(Sequence, int, int)

cloneHomProb

protected HomogeneousTrainSM.HomCondProb[] cloneHomProb(HomogeneousTrainSM.HomCondProb[] p)
Clones the given array of conditional probabilities.

Parameters:
p - the original conditional probabilities
Returns:
an array of clones