de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous
Class HomogeneousMM

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
          extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous.HomogeneousTrainSM
              extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous.HomogeneousMM
All Implemented Interfaces:
InstantiableFromParameterSet, SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable

public class HomogeneousMM
extends HomogeneousTrainSM

This class implements homogeneous Markov models (hMM) of arbitrary order.

Author:
Jens Keilwagen
See Also:
HomMMParameterSet

Nested Class Summary
 
Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous.HomogeneousTrainSM
HomogeneousTrainSM.HomCondProb
 
Field Summary
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous.HomogeneousTrainSM
order, powers
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
params, trained
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
HomogeneousMM(HomMMParameterSet params)
          Creates a new homogeneous Markov model from a parameter set.
HomogeneousMM(StringBuffer stringBuff)
          The standard constructor for the interface Storable.
 
Method Summary
 HomogeneousMM clone()
          Follows the conventions of Object's clone()-method.
protected  StringBuffer getFurtherModelInfos()
          Returns further model information as a StringBuffer.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
protected  Sequence getRandomSequence(Random r, int length)
          This method creates a random Sequence from a trained homogeneous model.
protected  String getXMLTag()
          Returns the XML tag that is used for this model in DiscreteGraphicalTrainSM.fromXML(StringBuffer) and DiscreteGraphicalTrainSM.toXML().
protected  double logProbFor(Sequence sequence, int startpos, int endpos)
          This method computes the logarithm of the probability of the given Sequence in the given interval.
protected  void set(DGTrainSMParameterSet params, boolean trained)
          Sets the parameters as internal parameters and does some essential computations.
protected  void setFurtherModelInfos(StringBuffer xml)
          This method replaces the internal model information with those from a StringBuffer.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
 void train(DataSet[] data, double[][] weights)
          Trains the homogeneous model using an array of weighted DataSets.
 void train(DataSet data, double[] weights)
          Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous.HomogeneousTrainSM
check, chooseFromDistr, cloneHomProb, emitDataSet, getLogProbFor, getMaximalMarkovOrder, getNumericalCharacteristics, train
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.discrete.DiscreteGraphicalTrainSM
fromXML, getCurrentParameterSet, getDescription, getESS, isInitialized, toXML
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toString, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HomogeneousMM

public HomogeneousMM(HomMMParameterSet params)
              throws CloneNotSupportedException,
                     IllegalArgumentException,
                     NonParsableException
Creates a new homogeneous Markov model from a parameter set.

Parameters:
params - the parameter set
Throws:
CloneNotSupportedException - if the parameter set could not be cloned
IllegalArgumentException - if the parameter set is not instantiated
NonParsableException - if the parameter set is not parsable
See Also:
HomMMParameterSet, HomogeneousTrainSM.HomogeneousTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.discrete.homogeneous.parameters.HomogeneousTrainSMParameterSet)

HomogeneousMM

public HomogeneousMM(StringBuffer stringBuff)
              throws NonParsableException
The standard constructor for the interface Storable. Creates a new HomogeneousMM out of its XML representation.

Parameters:
stringBuff - the XML representation as StringBuffer
Throws:
NonParsableException - if the HomogeneousMM could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable
Method Detail

clone

public HomogeneousMM clone()
                    throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class DiscreteGraphicalTrainSM
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning

getRandomSequence

protected Sequence getRandomSequence(Random r,
                                     int length)
                              throws WrongAlphabetException,
                                     WrongSequenceTypeException
Description copied from class: HomogeneousTrainSM
This method creates a random Sequence from a trained homogeneous model.

Specified by:
getRandomSequence in class HomogeneousTrainSM
Parameters:
r - the random generator
length - the length of the Sequence
Returns:
the created Sequence
Throws:
WrongAlphabetException - if something is wrong with the alphabet
WrongSequenceTypeException - if the Sequence type is not suitable (for the AlphabetContainer)

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Returns:
a short instance name

getLogPriorTerm

public double getLogPriorTerm()
                       throws Exception
Description copied from interface: StatisticalModel
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong

logProbFor

protected double logProbFor(Sequence sequence,
                            int startpos,
                            int endpos)
Description copied from class: HomogeneousTrainSM
This method computes the logarithm of the probability of the given Sequence in the given interval. The method is only used in StatisticalModel.getLogProbFor(Sequence, int, int) after the method HomogeneousTrainSM.check(Sequence, int, int) has been invoked.

Specified by:
logProbFor in class HomogeneousTrainSM
Parameters:
sequence - the Sequence
startpos - the start position within the Sequence
endpos - the end position within the Sequence
Returns:
the logarithm of the probability for the given subsequence
See Also:
HomogeneousTrainSM.check(Sequence, int, int), StatisticalModel.getLogProbFor(Sequence, int, int)

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Specified by:
toString in interface SequenceScore
Overrides:
toString in class DiscreteGraphicalTrainSM
Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance

train

public void train(DataSet data,
                  double[] weights)
           throws Exception
Description copied from interface: TrainableStatisticalModel
Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the data set as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences as DataSet
weights - the weights of the elements, each weight should be non-negative
Throws:
Exception - if the training did not succeed (e.g. the dimension of weights and the number of sequences in the data set do not match)
See Also:
DataSet.getElementAt(int), DataSet.ElementEnumerator

train

public void train(DataSet[] data,
                  double[][] weights)
           throws Exception
Description copied from class: HomogeneousTrainSM
Trains the homogeneous model using an array of weighted DataSets. The Sequence weights in weights[i] are for the DataSet in data[i].

Specified by:
train in class HomogeneousTrainSM
Parameters:
data - the given DataSets
weights - the weights
Throws:
Exception - if something went wrong, furthermore data.length has to be weights.length

getFurtherModelInfos

protected StringBuffer getFurtherModelInfos()
Description copied from class: DiscreteGraphicalTrainSM
Returns further model information as a StringBuffer.

Specified by:
getFurtherModelInfos in class DiscreteGraphicalTrainSM
Returns:
further model information like parameters of the distribution etc. in XML format
See Also:
DiscreteGraphicalTrainSM.toXML()

getXMLTag

protected String getXMLTag()
Description copied from class: DiscreteGraphicalTrainSM
Returns the XML tag that is used for this model in DiscreteGraphicalTrainSM.fromXML(StringBuffer) and DiscreteGraphicalTrainSM.toXML().

Specified by:
getXMLTag in class DiscreteGraphicalTrainSM
Returns:
the XML tag that is used in DiscreteGraphicalTrainSM.fromXML(StringBuffer) and DiscreteGraphicalTrainSM.toXML()
See Also:
DiscreteGraphicalTrainSM.fromXML(StringBuffer), DiscreteGraphicalTrainSM.toXML()

set

protected void set(DGTrainSMParameterSet params,
                   boolean trained)
            throws CloneNotSupportedException,
                   NonParsableException
Description copied from class: DiscreteGraphicalTrainSM
Sets the parameters as internal parameters and does some essential computations. Used in fromParameterSet-methods.

Overrides:
set in class HomogeneousTrainSM
Parameters:
params - the new ParameterSet
trained - indicates if the model is trained or not
Throws:
CloneNotSupportedException - if the parameter set could not be cloned
NonParsableException - if the parameters of the model could not be parsed

setFurtherModelInfos

protected void setFurtherModelInfos(StringBuffer xml)
                             throws NonParsableException
Description copied from class: DiscreteGraphicalTrainSM
This method replaces the internal model information with those from a StringBuffer.

Specified by:
setFurtherModelInfos in class DiscreteGraphicalTrainSM
Parameters:
xml - contains the model information like parameters of the distribution etc. in XML format
Throws:
NonParsableException - if the StringBuffer could not be parsed
See Also:
DiscreteGraphicalTrainSM.fromXML(StringBuffer)