public class HigherOrderHMM extends AbstractHMM
.
A state is denoted final states if it is allowed at the end of a path. Hence, any valid path always ends with a final state.
Using the method AbstractTrainableStatisticalModel.getLogProbFor(Sequence) for sequence
returns the value

to all states leads to the computation of the likelihood.
| Modifier and Type | Class and Description |
|---|---|
protected static class |
HigherOrderHMM.Type
This enum defined different types of computations that will be done using the backward algorithm.
|
| Modifier and Type | Field and Description |
|---|---|
protected double[] |
backwardIntermediate
Helper variable = only for internal use.
|
protected int[] |
container
Helper variable = only for internal use.
|
protected double[] |
logEmission
Helper variable = only for internal use.
|
protected int[][] |
numberOfSummands
Helper variable = only for internal use.
|
protected boolean |
skipInit
Indicates if the model should be initialized (randomly) before optimization
|
protected IntList |
stateList
Helper variable = only for internal use.
|
bwdMatrix, emission, emissionIdx, finalState, forward, fwdMatrix, name, sostream, START_NODE, states, threads, trainingParameter, transitionalphabets, length| Constructor and Description |
|---|
HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
Emission[] emission,
BasicHigherOrderTransition.AbstractTransitionElement... te)
This is a convenience constructor.
|
HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
Emission[] emission,
BasicHigherOrderTransition.AbstractTransitionElement... te)
This is the main constructor.
|
HigherOrderHMM(StringBuffer xml)
The standard constructor for the interface
Storable. |
| Modifier and Type | Method and Description |
|---|---|
protected void |
appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation.
|
protected double |
baumWelch(int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the likelihood and modifies the sufficient statistics according to the Baum-Welch algorithm.
|
HigherOrderHMM |
clone()
Follows the conventions of
Object's clone()-method. |
protected void |
createHelperVariables()
This method instantiates all helper variables that are need inside the model for instance for filling forward and backward matrix, ...
|
protected void |
createStates()
This method creates states for the internal usage.
|
protected void |
estimateFromStatistics()
This method estimates the parameters of all emissions and the transition using their sufficient statistics.
|
protected void |
extractFurtherInformation(StringBuffer xml)
This method extracts further information from the XML representation.
|
protected void |
fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the backward-matrix for a given sequence.
|
protected void |
fillBwdOrViterbiMatrix(HigherOrderHMM.Type t,
int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the entries of the backward or the viterbi matrix.
|
protected void |
fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the forward-matrix for a given sequence.
|
protected void |
fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
This method fills the log state posterior of Sequence
seq in a given matrix. |
protected void |
finalize() |
ResultSet |
getCharacteristics()
Returns some information characterizing or describing the current
instance.
|
int[] |
getEmissionIndexes()
Returns a clone of the internal array of emission indexes that represent which emission is used in which state.
|
Emission[] |
getEmissions()
Returns a clone of the internal emissions.
|
String |
getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ...
|
double |
getLogPriorTerm()
Returns a value that is proportional to the log of the prior.
|
double |
getLogProbForPath(IntList path,
int startPos,
Sequence seq) |
double[] |
getLogScoreFor(DataSet data)
This method computes the logarithm of the scores of all sequences
in the given data set.
|
void |
getLogScoreFor(DataSet data,
double[] res)
This method computes and stores the logarithm of the scores for
any sequence in the data set in the given
double-array. |
byte |
getMaximalMarkovOrder()
This method returns the maximal used Markov order, if possible.
|
String[] |
getNames()
Returns a clone of the state names.
|
NumericalResultSet |
getNumericalCharacteristics()
Returns the subset of numerical values that are also returned by
SequenceScore.getCharacteristics(). |
HMMTrainingParameterSet |
getTrainingParams()
Returns a clone of the training parameters
|
TransitionElement[] |
getTransisionElements()
Returns the transition elements of the internal
Transition. |
Pair<IntList,Double> |
getViterbiPathFor(int startPos,
int endPos,
Sequence seq) |
protected String |
getXMLTag()
Returns the tag for the XML representation.
|
protected void |
initialize(DataSet data,
double[] weight)
This method initializes all emissions and the transition.
|
void |
initializeRandomly()
This method initializes all emissions and the transition randomly.
|
boolean |
isInitialized()
This method can be used to determine whether the instance is initialized.
|
protected void |
resetStatistics()
This method resets all sufficient statistics of all emissions and the transition.
|
void |
samplePath(IntList path,
int startPos,
int endPos,
Sequence seq)
This method samples a valid path for the given sequence
seq using the internal parameters. |
void |
setSkiptInit(boolean skip) |
void |
train(DataSet data,
double[] weights)
Trains the
TrainableStatisticalModel object given the data as DataSet using
the specified weights. |
protected double |
viterbi(IntList path,
int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the viterbi score of a given sequence
seq. |
createMatrixForStatePosterior, decodePath, decodeStatePosterior, determineFinalStates, fromXML, getFinalStatePosterioriMatrix, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getLogProbFor, getLogStatePosteriorMatrixFor, getLogStatePosteriorMatrixFor, getNumberOfStates, getNumberOfThreads, getRunTimeException, getStatePosteriorMatrixFor, getStatePosteriorMatrixFor, getViterbiPathFor, getViterbiPathsFor, initTransition, logProb, provideMatrix, setOutputStream, toString, toXML, traincheck, emitDataSet, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toStringprotected int[] container
Transition.fillTransitionInformation(int, int, int, int[]).protected double[] logEmission
AbstractHMM.emissionprotected double[] backwardIntermediate
numberOfSummandsprotected int[][] numberOfSummands
protected IntList stateList
samplePath(IntList, int, int, Sequence).protected boolean skipInit
public HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, Emission[] emission, BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
i used emission i on the forward strand.trainingParameterSet - the ParameterSet that determines the training algorithm and contains the necessary Parametersname - the names of the statesemission - the emissionste - the BasicHigherOrderTransition.AbstractTransitionElements building a transitionException - if
name, emissionIdx, or forward is not equal to the number of statesAlphabetContainerHigherOrderHMM(HMMTrainingParameterSet, String[], int[], boolean[], Emission[], de.jstacs.sequenceScores.statisticalModels.trainable.hmm.transitions.BasicHigherOrderTransition.AbstractTransitionElement...)public HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, int[] emissionIdx, boolean[] forward, Emission[] emission, BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
trainingParameterSet - the ParameterSet that determines the training algorithm and contains the necessary Parametersname - the names of the statesemissionIdx - the indices of the emissions that should be used for each state, if null state i will use emission iforward - a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used,
if null all states use the forward strandemission - the emissionste - the BasicHigherOrderTransition.AbstractTransitionElements building a transitionException - if
name, emissionIdx, or forward is not equal to the number of statesAlphabetContainerpublic HigherOrderHMM(StringBuffer xml) throws NonParsableException
Storable.
Constructs an HigherOrderHMM out of an XML representation.xml - the XML representation as StringBufferNonParsableException - if the HigherOrderHMM could not be reconstructed out of
the StringBuffer xmlprotected void createHelperVariables()
AbstractHMMcreateHelperVariables in class AbstractHMMprotected String getXMLTag()
AbstractHMMgetXMLTag in class AbstractHMMAbstractHMM.fromXML(StringBuffer),
AbstractHMM.toXML()protected void appendFurtherInformation(StringBuffer xml)
AbstractHMMappendFurtherInformation in class AbstractHMMxml - the XML representationprotected void extractFurtherInformation(StringBuffer xml) throws NonParsableException
extractFurtherInformation in class AbstractHMMxml - the XML representationNonParsableException - if the information could not be reconstructed out of the StringBuffer xmlpublic HigherOrderHMM clone() throws CloneNotSupportedException
AbstractTrainableStatisticalModelObject's clone()-method.clone in interface SequenceScoreclone in interface TrainableStatisticalModelclone in class AbstractHMMAbstractTrainableStatisticalModel
(the member-AlphabetContainer isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X directly inherited from
AbstractTrainableStatisticalModel. Hence X's
clone()-method should work as:Object o = (X)super.clone(); o defined by
X that are not of simple data-types like
int, double, ... have to be deeply
copied return oCloneNotSupportedException - if something went wrong while cloningprotected void createStates()
AbstractHMMcreateStates in class AbstractHMMpublic double getLogPriorTerm()
StatisticalModelpublic double getLogProbForPath(IntList path, int startPos, Sequence seq) throws Exception
getLogProbForPath in class AbstractHMMpath - the given state pathstartPos - the start position within the sequence(s) (inclusive)seq - the sequence(s)Exception - if the probability for the sequence given path could not be computed, for instance if the model is not trained, ...protected void fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
throws Exception
AbstractHMMseq in a given matrix.fillLogStatePosteriorMatrix in class AbstractHMMstatePosterior - the matrix for the log state posteriorstartPos - the start positionendPos - the end positionseq - the sequencesilentZero - true if the state posterior for silent states is defined to be zero, otherwise falseException - if an error occurs during the computationAbstractHMM.getLogStatePosteriorMatrixFor(int, int, Sequence),
AbstractHMM.createMatrixForStatePosterior(int, int)protected void fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
throws OperationNotSupportedException,
WrongLengthException
AbstractHMMfillFwdMatrix in class AbstractHMMstartPos - the start position (inclusive) in the sequenceendPos - the end position (inclusive) in the sequenceseq - the sequenceOperationNotSupportedExceptionWrongLengthExceptionprotected void fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
throws Exception
AbstractHMMfillBwdMatrix in class AbstractHMMstartPos - the start position (inclusive) in the sequenceendPos - the end position (inclusive) in the sequenceseq - the sequenceException - if some error occurs during the computationprotected void fillBwdOrViterbiMatrix(HigherOrderHMM.Type t, int startPos, int endPos, double weight, Sequence seq) throws Exception
t - a switch to decide which computation modestartPos - start position of the sequenceendPos - end position of the sequenceweight - the given external weight of the sequence (only used for Baum-Welch)seq - the sequenceException - forwarded from TrainableState.addToStatistic(int, int, double, de.jstacs.data.sequences.Sequence) and State.getLogScoreFor(int, int, Sequence)public Pair<IntList,Double> getViterbiPathFor(int startPos, int endPos, Sequence seq) throws Exception
getViterbiPathFor in class AbstractHMMstartPos - the start position within the sequenceendPos - the end position within the sequenceseq - the sequencePair containing the viterbi state path and the corresponding scoreException - if the viterbi path could not be computed, for instance if the model is not trained, ...protected double viterbi(IntList path, int startPos, int endPos, double weight, Sequence seq) throws Exception
seq.
Furthermore, it allows either to modify the sufficient statistics according
to the viterbi training algorithm or to compute the viterbi path, which will
in this case be returned in path.path - if null viterbi training, otherwise computation of the viterbi pathstartPos - the start positionendPos - the end positionweight - the sequence weight, in most cases this is 1seq - the sequenceException - an error occurs during the computationprotected double baumWelch(int startPos,
int endPos,
double weight,
Sequence seq)
throws Exception
startPos - the start positionendPos - the end positionweight - the sequence weight, in most cases this is 1seq - the sequenceException - an error occurs during the computationpublic void train(DataSet data, double[] weights) throws Exception
TrainableStatisticalModelTrainableStatisticalModel object given the data as DataSet using
the specified weights. The weight at position i belongs to the element at
position i. So the array weight should have the number of
sequences in the data set as dimension. (Optionally it is possible to use
weight == null if all weights have the value one.)train(data1); train(data2)
should be a fully trained model over data2 and not over
data1+data2. All parameters of the model were given by the
call of the constructor.data - the given sequences as DataSetweights - the weights of the elements, each weight should be
non-negativeException - if the training did not succeed (e.g. the dimension of
weights and the number of sequences in the
data set do not match)DataSet.getElementAt(int),
DataSet.ElementEnumeratorprotected void initialize(DataSet data, double[] weight) throws Exception
initializeRandomly().data - the data setweight - the weights for each sequence of the data setException - if an error occurs during the initializationpublic void setSkiptInit(boolean skip)
public void initializeRandomly()
protected void resetStatistics()
protected void estimateFromStatistics()
public final byte getMaximalMarkovOrder()
throws UnsupportedOperationException
StatisticalModelgetMaximalMarkovOrder in interface StatisticalModelgetMaximalMarkovOrder in class AbstractTrainableStatisticalModelUnsupportedOperationException - if the model can't give a proper answerpublic ResultSet getCharacteristics() throws Exception
SequenceScoreStorableResult.getCharacteristics in interface SequenceScoregetCharacteristics in class AbstractTrainableStatisticalModelException - if some of the characteristics could not be definedStorableResultpublic String getInstanceName()
SequenceScorepublic double[] getLogScoreFor(DataSet data) throws Exception
SequenceScoreSequenceScore.getLogScoreFor(Sequence).getLogScoreFor in interface SequenceScoregetLogScoreFor in class AbstractTrainableStatisticalModeldata - the data set of sequencesException - if something went wrongSequenceScore.getLogScoreFor(Sequence)public void getLogScoreFor(DataSet data, double[] res) throws Exception
SequenceScoredouble-array.
SequenceScore.getLogScoreFor(Sequence).getLogScoreFor in interface SequenceScoregetLogScoreFor in class AbstractTrainableStatisticalModeldata - the data set of sequencesres - the array for the results, has to have length
data.getNumberOfElements() (which returns the
number of sequences in the data set)Exception - if something went wrongSequenceScore.getLogScoreFor(Sequence),
SequenceScore.getLogScoreFor(DataSet)public NumericalResultSet getNumericalCharacteristics() throws Exception
SequenceScoreSequenceScore.getCharacteristics().Exception - if some of the characteristics could not be definedpublic boolean isInitialized()
SequenceScoreSequenceScore.getLogScoreFor(Sequence).true if the instance is initialized, false
otherwiseprotected void finalize()
throws Throwable
finalize in class AbstractHMMThrowablepublic void samplePath(IntList path, int startPos, int endPos, Sequence seq) throws Exception
seq using the internal parameters.public Emission[] getEmissions() throws CloneNotSupportedException
CloneNotSupportedException - if the emissions could not be clonedpublic TransitionElement[] getTransisionElements() throws CloneNotSupportedException
Transition.CloneNotSupportedException - if the transition elements could not be clonedHigherOrderTransition.getTransisionElements()public int[] getEmissionIndexes()
public String[] getNames()
public HMMTrainingParameterSet getTrainingParams() throws CloneNotSupportedException
CloneNotSupportedException - if the parameters could not be cloned