de.jstacs.sequenceScores.statisticalModels.trainable.hmm.models
Class SamplingHigherOrderHMM

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM
          extended by de.jstacs.sequenceScores.statisticalModels.trainable.hmm.models.HigherOrderHMM
              extended by de.jstacs.sequenceScores.statisticalModels.trainable.hmm.models.SamplingHigherOrderHMM
All Implemented Interfaces:
SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable
Direct Known Subclasses:
SamplingPhyloHMM

public class SamplingHigherOrderHMM
extends HigherOrderHMM

Author:
Michael Scharfe, Jens Keilwagen

Nested Class Summary
static class SamplingHigherOrderHMM.ViterbiComputation
          Emumeration of all possible Viterbi-Path methods
 
Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.hmm.models.HigherOrderHMM
HigherOrderHMM.Type
 
Field Summary
protected  BurnInTest burnInTest
          This variable holds the BurnInTest used for training the model
protected  boolean hasSampled
          This boolean indicates if the parameters for the model were sampled
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.hmm.models.HigherOrderHMM
backwardIntermediate, container, logEmission, numberOfSummands, skipInit, stateList
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM
bwdMatrix, emission, emissionIdx, finalState, forward, fwdMatrix, name, sostream, START_NODE, states, threads, trainingParameter, transition
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
SamplingHigherOrderHMM(SamplingHMMTrainingParameterSet trainingParameterSet, String[] name, int[] emissionIdx, boolean[] forward, SamplingEmission[] emission, TransitionElement... te)
          This is the main constructor.
SamplingHigherOrderHMM(StringBuffer xml)
          The standard constructor for the interface Storable.
 
Method Summary
protected  void acceptParameters()
          This method can be used to accept the current parameters (and save them into a file)
protected  void appendFurtherInformation(StringBuffer xml)
          This method appends further information to the XML representation.
 SamplingHigherOrderHMM clone()
          Follows the conventions of Object's clone()-method.
protected  void createStates()
          This method creates states for the internal usage.
protected  void drawFromStatistics()
          This method draws all parameters for the current statistics
protected  void extractFurtherInformation(StringBuffer xml)
          This method extracts further information from the XML representation.
protected  void furtherInits(DataSet data, double[] weights)
          This method allows the implementation of further initializations
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
protected  double getLogPosteriorFromStatistic()
          This method calculates the a posteriori probability for the current statistics
 double getLogProbForPath(IntList path, int startPos, Sequence seq)
           
 double[][] getLogStatePosteriorMatrixFor(int startPos, int endPos, Sequence seq)
          This method returns the log state posterior of all states for a sequence.
protected  void getNewParameters()
          This method set all parameters for the next sampling step
 Pair<IntList,Double> getViterbiPath(int startPos, int endPos, Sequence seq, SamplingHigherOrderHMM.ViterbiComputation compute)
          This method returns a viterbi path that is the optimum for the choosen ViterbiComputation method
 Pair<IntList,Double> getViterbiPathFor(int startPos, int endPos, Sequence seq)
           
protected  String getXMLTag()
          Returns the tag for the XML representation.
protected  double gibbsSampling(int startPos, int endPos, double weight, Sequence seq)
          This method implements a sampling step in the sampling procedure
protected  void gibbsSamplingStep(int sampling, int steps, boolean append, DataSet data, double[] weights)
          This method implements the next step(s) in the sampling procedure
protected  void initTraining(DataSet data, double[] weights)
          This methods initialize the training procedure with the given training data
 boolean isInitialized()
          This method can be used to determine whether the instance is initialized.
protected  double logProb(int startpos, int endpos, Sequence sequence)
          This method computes the logarithm of the probability of the corresponding subsequences.
protected  boolean parseNextParameterSet()
          This method parse a parameter set stored in file during sampling
protected  boolean parseParameterSet(int sampling, int idx)
          This method allows the user to parse the set of parameters with index idx of a certain sampling (from a file).
 void train(DataSet data, double[] weights)
          Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.hmm.models.HigherOrderHMM
baumWelch, createHelperVariables, estimateFromStatistics, fillBwdMatrix, fillBwdOrViterbiMatrix, fillFwdMatrix, fillLogStatePosteriorMatrix, finalize, getCharacteristics, getLogPriorTerm, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, getNumericalCharacteristics, initialize, initializeRandomly, resetStatistics, samplePath, viterbi
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM
createMatrixForStatePosterior, decodePath, decodeStatePosterior, determineFinalStates, fromXML, getFinalStatePosterioriMatrix, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getLogProbFor, getLogStatePosteriorMatrixFor, getNumberOfStates, getNumberOfThreads, getRunTimeException, getStatePosteriorMatrixFor, getStatePosteriorMatrixFor, getViterbiPathFor, getViterbiPathsFor, initTransition, provideMatrix, setOutputStream, toString, toXML, train
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
check, emitDataSet, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

burnInTest

protected BurnInTest burnInTest
This variable holds the BurnInTest used for training the model


hasSampled

protected boolean hasSampled
This boolean indicates if the parameters for the model were sampled

Constructor Detail

SamplingHigherOrderHMM

public SamplingHigherOrderHMM(SamplingHMMTrainingParameterSet trainingParameterSet,
                              String[] name,
                              int[] emissionIdx,
                              boolean[] forward,
                              SamplingEmission[] emission,
                              TransitionElement... te)
                       throws Exception
This is the main constructor.

Parameters:
trainingParameterSet - the ParameterSet that determines the training algorithm and contains the necessary Parameters
name - the names of the states
emissionIdx - the indices of the emissions that should be used for each state
forward - a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used
emission - the emissions
te - the TransitionElements building a transition
Throws:
Exception - if
  • some component could not be cloned
  • some the length of name, emissionIdx, or forward is not equal to the number of states
  • not all emissions use the same AlphabetContainer
  • the states can not be handled by the transition

SamplingHigherOrderHMM

public SamplingHigherOrderHMM(StringBuffer xml)
                       throws NonParsableException
The standard constructor for the interface Storable. Constructs an SamplingHigherOrderHMM out of an XML representation.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the SamplingHigherOrderHMM could not be reconstructed out of the StringBuffer xml
Method Detail

clone

public SamplingHigherOrderHMM clone()
                             throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class HigherOrderHMM
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning

createStates

protected void createStates()
Description copied from class: AbstractHMM
This method creates states for the internal usage.

Overrides:
createStates in class HigherOrderHMM

acceptParameters

protected void acceptParameters()
                         throws IOException
This method can be used to accept the current parameters (and save them into a file)

Throws:
IOException - if the parameters could not be written

drawFromStatistics

protected void drawFromStatistics()
                           throws Exception
This method draws all parameters for the current statistics

Throws:
Exception - if the parameters could not be drawn

gibbsSampling

protected double gibbsSampling(int startPos,
                               int endPos,
                               double weight,
                               Sequence seq)
                        throws Exception
This method implements a sampling step in the sampling procedure

Parameters:
startPos - the start position in the sequence
endPos - the end position in the sequence
weight - the weight for the sequence
seq - the sequence
Returns:
the score for the sampi
Throws:
Exception - if the sampling step did not succeed

gibbsSamplingStep

protected void gibbsSamplingStep(int sampling,
                                 int steps,
                                 boolean append,
                                 DataSet data,
                                 double[] weights)
                          throws Exception
This method implements the next step(s) in the sampling procedure

Parameters:
sampling - the index of the sampling
steps - the number of sampling that should be executed
append - whether to append the sampled parameters to an existing file or to overwrite the file
data - the data used for sampling
weights - the weight for each sequence
Throws:
Exception - if something wents wrong

getNewParameters

protected void getNewParameters()
                         throws Exception
This method set all parameters for the next sampling step

Throws:
Exception - if something went wrong

train

public void train(DataSet data,
                  double[] weights)
           throws Exception
Description copied from interface: TrainableStatisticalModel
Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the data set as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Specified by:
train in interface TrainableStatisticalModel
Overrides:
train in class HigherOrderHMM
Parameters:
data - the given sequences as DataSet
weights - the weights of the elements, each weight should be non-negative
Throws:
Exception - if the training did not succeed (e.g. the dimension of weights and the number of sequences in the data set do not match)
See Also:
DataSet.getElementAt(int), DataSet.ElementEnumerator

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Specified by:
getInstanceName in interface SequenceScore
Overrides:
getInstanceName in class HigherOrderHMM
Returns:
a short instance name

isInitialized

public boolean isInitialized()
Description copied from interface: SequenceScore
This method can be used to determine whether the instance is initialized. If the instance is initialized you should be able to invoke SequenceScore.getLogScoreFor(Sequence).

Specified by:
isInitialized in interface SequenceScore
Overrides:
isInitialized in class HigherOrderHMM
Returns:
true if the instance is initialized, false otherwise

logProb

protected double logProb(int startpos,
                         int endpos,
                         Sequence sequence)
                  throws Exception
Description copied from class: AbstractHMM
This method computes the logarithm of the probability of the corresponding subsequences. The method does not check the AlphabetContainer and possible further features before starting the computation.

Overrides:
logProb in class AbstractHMM
Parameters:
startpos - the start position (inclusive)
endpos - the end position (inclusive)
sequence - the Sequence(s)
Returns:
the logarithm of the probability
Throws:
Exception - if the model has no parameters (for instance if it is not trained)

parseParameterSet

protected boolean parseParameterSet(int sampling,
                                    int idx)
                             throws Exception
This method allows the user to parse the set of parameters with index idx of a certain sampling (from a file). The internal numbering should start with 0.

Parameters:
sampling - the index of the sampling
idx - the index of the parameter set
Returns:
true if the parameter set could be parsed
Throws:
Exception - if there is a problem with parsing the parameters

parseNextParameterSet

protected boolean parseNextParameterSet()
                                 throws Exception
This method parse a parameter set stored in file during sampling

Returns:
true if parsing was successful
Throws:
Exception - if the parameters could not be parsed

getXMLTag

protected String getXMLTag()
Description copied from class: AbstractHMM
Returns the tag for the XML representation.

Overrides:
getXMLTag in class HigherOrderHMM
Returns:
the tag for the XML representation
See Also:
AbstractHMM.fromXML(StringBuffer), AbstractHMM.toXML()

appendFurtherInformation

protected void appendFurtherInformation(StringBuffer xml)
Description copied from class: AbstractHMM
This method appends further information to the XML representation. It allows subclasses to save further parameters that are not defined in the superclass.

Overrides:
appendFurtherInformation in class HigherOrderHMM
Parameters:
xml - the XML representation

extractFurtherInformation

protected void extractFurtherInformation(StringBuffer xml)
                                  throws NonParsableException
Description copied from class: HigherOrderHMM
This method extracts further information from the XML representation. It allows subclasses to cast further parameters that are not defined in the superclass.

Overrides:
extractFurtherInformation in class HigherOrderHMM
Parameters:
xml - the XML representation
Throws:
NonParsableException - if the information could not be reconstructed out of the StringBuffer xml

initTraining

protected void initTraining(DataSet data,
                            double[] weights)
                     throws Exception
This methods initialize the training procedure with the given training data

Parameters:
data - the data set used for training
weights - the weight for each sequence
Throws:
Exception - if the transition or emissions could not be initialized

furtherInits

protected void furtherInits(DataSet data,
                            double[] weights)
                     throws Exception
This method allows the implementation of further initializations

Parameters:
data - the current data set
weights - the weight for each sequence
Throws:
Exception - if the init steps did not succeed

getLogStatePosteriorMatrixFor

public double[][] getLogStatePosteriorMatrixFor(int startPos,
                                                int endPos,
                                                Sequence seq)
                                         throws Exception
Description copied from class: AbstractHMM
This method returns the log state posterior of all states for a sequence.

Overrides:
getLogStatePosteriorMatrixFor in class AbstractHMM
Parameters:
startPos - the start position within the sequence
endPos - the end position within the sequence
seq - the sequence
Returns:
the score for each state an each sequence position
Throws:
Exception - if the state posterior could not be computed, for instance if the model is not trained, ...

getLogProbForPath

public double getLogProbForPath(IntList path,
                                int startPos,
                                Sequence seq)
                         throws Exception
Overrides:
getLogProbForPath in class HigherOrderHMM
Parameters:
path - the given state path
startPos - the start position within the sequence(s) (inclusive)
seq - the sequence(s)
Returns:
the logarithm of the probability for the given path and the given sequence(s)
Throws:
Exception - if the probability for the sequence given path could not be computed, for instance if the model is not trained, ...

getViterbiPathFor

public Pair<IntList,Double> getViterbiPathFor(int startPos,
                                              int endPos,
                                              Sequence seq)
                                       throws Exception
Overrides:
getViterbiPathFor in class HigherOrderHMM
Parameters:
startPos - the start position within the sequence
endPos - the end position within the sequence
seq - the sequence
Returns:
a Pair containing the viterbi state path and the corresponding score
Throws:
Exception - if the viterbi path could not be computed, for instance if the model is not trained, ...

getViterbiPath

public Pair<IntList,Double> getViterbiPath(int startPos,
                                           int endPos,
                                           Sequence seq,
                                           SamplingHigherOrderHMM.ViterbiComputation compute)
                                    throws Exception
This method returns a viterbi path that is the optimum for the choosen ViterbiComputation method

Parameters:
startPos - the start position in the sequence
endPos - the end position in the sequence
seq - the sequence
compute - the ViterbiComputation method
Returns:
the pair of path and score
Throws:
Exception - if the parameters could not be parsed from file

getLogPosteriorFromStatistic

protected double getLogPosteriorFromStatistic()
This method calculates the a posteriori probability for the current statistics

Returns:
the logarithm of the a posteriori probability for the current statistics