public abstract class AbstractHMM extends AbstractTrainableStatisticalModel implements Cloneable, Storable
ParameterSet
denoted as HMMTrainingParameterSet
.
For creating frequently used HMMs please check HMMFactory
.State
,
Transition
,
HMMFactory
Modifier and Type | Field and Description |
---|---|
protected double[][] |
bwdMatrix
matrix for all backward-computed variables;
bwdMatrix[l][c] = log P(x_{l+1},...,x_L | (s_{l-order+1},...,s_l)=c , parameter)
|
protected Emission[] |
emission
The emissions used in the states.
|
protected int[] |
emissionIdx
The index of the used emission of each state.
|
protected boolean[] |
finalState
An array of switches that contains for each state whether is is a final state or not (cf.
|
protected boolean[] |
forward
An array of switches that contains for each state whether the emission is forward or the reverse strand.
|
protected double[][] |
fwdMatrix
matrix for all forward-computed variables;
fwdMatrix[l][c] = log P(x_1,...,x_l,(s_{l-order+1},...,s_l)=c | parameter)
|
protected String[] |
name
The names of the states.
|
protected SafeOutputStream |
sostream
This is the stream for writing information while training.
|
static String |
START_NODE
The
String for the start node used in Graphviz annotation. |
protected State[] |
states
The (hidden) states of the HMM.
|
protected int |
threads
The number of threads that is internally used.
|
protected HMMTrainingParameterSet |
trainingParameter
The
ParameterSet containing all Parameter s for the training of the HMM. |
protected Transition |
transition
The transitions between all (hidden) states of the HMM.
|
alphabets, length
Modifier | Constructor and Description |
---|---|
protected |
AbstractHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
Emission[] emission)
This is the main constructor for an HMM.
|
protected |
AbstractHMM(StringBuffer xml)
The standard constructor for the interface
Storable . |
Modifier and Type | Method and Description |
---|---|
protected abstract void |
appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation.
|
AbstractHMM |
clone()
Follows the conventions of
Object 's clone() -method. |
protected abstract void |
createHelperVariables()
This method instantiates all helper variables that are need inside the model for instance for filling forward and backward matrix, ...
|
protected double[][] |
createMatrixForStatePosterior(int startPos,
int endPos)
This method creates an empty matrix for the log state posterior.
|
protected abstract void |
createStates()
This method creates states for the internal usage.
|
String[] |
decodePath(IntList path)
This method decodes any path of the HMM, i.e.
|
static int[][] |
decodeStatePosterior(double[][]... statePosterior)
The method returns the decoded state posterior, i.e.
|
protected void |
determineFinalStates()
This method determines the final states of the HMM.
|
protected abstract void |
extractFurtherInformation(StringBuffer xml)
This method extracts further information from the XML representation.
|
protected abstract void |
fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the backward-matrix for a given sequence.
|
protected abstract void |
fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the forward-matrix for a given sequence.
|
protected abstract void |
fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
This method fills the log state posterior of Sequence
seq in a given matrix. |
protected void |
finalize() |
protected void |
fromXML(StringBuffer xml)
This method is used by the
AbstractHMM(StringBuffer) constructor for creating an instance from an XML representation. |
protected double[][] |
getFinalStatePosterioriMatrix(double[][] intermediate)
This method is used if
fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean) is used with silentZero==true
to eliminate the first row. |
String |
getGraphvizRepresentation(NumberFormat nf)
This method returns a
String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
boolean sameTypeSameRank)
This method returns a
String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
boolean sameTypeSameRank)
This method returns a
String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
HashMap<String,String> rankPatterns)
This method returns a
String representation of the structure that
can be used in Graphviz to create an image. |
double |
getLogProbFor(Sequence sequence,
int startpos,
int endpos)
Returns the logarithm of the probability of (a part of) the given
sequence given the model.
|
abstract double |
getLogProbForPath(IntList path,
int startPos,
Sequence seq) |
double[][][] |
getLogStatePosteriorMatrixFor(DataSet data)
This method returns the log state posteriors for all sequences of the data set
data . |
double[][] |
getLogStatePosteriorMatrixFor(int startPos,
int endPos,
Sequence seq)
This method returns the log state posterior of all states for a sequence.
|
int |
getNumberOfStates()
This method returns the number of the (hidden) states
|
int |
getNumberOfThreads()
This method returns the number of threads that is internally used.
|
protected static RuntimeException |
getRunTimeException(Exception e)
This method creates an
RuntimeException from any other Exception |
double[][][] |
getStatePosteriorMatrixFor(DataSet data)
This method returns the state posteriors for all sequences of the data set
data . |
double[][] |
getStatePosteriorMatrixFor(Sequence seq)
This method returns the log state posterior of all states for a sequence.
|
abstract Pair<IntList,Double> |
getViterbiPathFor(int startPos,
int endPos,
Sequence seq) |
Pair<IntList,Double> |
getViterbiPathFor(Sequence seq) |
Pair<IntList,Double>[] |
getViterbiPathsFor(DataSet data)
This method returns the viterbi paths and scores for all sequences of the data set
data . |
protected abstract String |
getXMLTag()
Returns the tag for the XML representation.
|
protected void |
initTransition(BasicHigherOrderTransition.AbstractTransitionElement... te)
This method creates the internal transition.
|
protected double |
logProb(int startpos,
int endpos,
Sequence sequence)
This method computes the logarithm of the probability of the corresponding subsequences.
|
protected void |
provideMatrix(int type,
int length)
This method invokes the method
createHelperVariables() and provides the matrix with given type. |
void |
setOutputStream(OutputStream o)
Sets the
OutputStream that is used e.g. |
String |
toString(NumberFormat nf)
This method returns a
String representation of the instance. |
StringBuffer |
toXML()
This method returns an XML representation as
StringBuffer of an
instance of the implementing class. |
void |
train(DataSet data)
Trains the
TrainableStatisticalModel object given the data as DataSet . |
check, emitDataSet, getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
train
getLogPriorTerm
getInstanceName, getNumericalCharacteristics, isInitialized
protected State[] states
protected String[] name
protected int[] emissionIdx
protected boolean[] forward
ComplementableDiscreteAlphabet
protected Emission[] emission
protected Transition transition
protected double[][] fwdMatrix
protected double[][] bwdMatrix
protected HMMTrainingParameterSet trainingParameter
ParameterSet
containing all Parameter
s for the training of the HMM.protected SafeOutputStream sostream
protected boolean[] finalState
protected int threads
protected AbstractHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, int[] emissionIdx, boolean[] forward, Emission[] emission) throws CloneNotSupportedException, WrongAlphabetException
trainingParameterSet
- a ParameterSet
containing all Parameter
s for the training of the HMMname
- the names of the statesemissionIdx
- the indices of the emissions that should be used for each state, if null
state i
will use emission i
forward
- a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used,
if null
all states use the forward strandemission
- the emissionsCloneNotSupportedException
- if trainingParameterSet
can not be clonedWrongAlphabetException
- if not all (non-silent) emissions have use the same AlphabetContainer
protected AbstractHMM(StringBuffer xml) throws NonParsableException
Storable
.
Constructs a AbstractHMM
out of an XML representation.xml
- the XML representation as StringBuffer
NonParsableException
- if the AbstractHMM
could not be reconstructed out of
the StringBuffer
xml
protected void initTransition(BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
te
- the individual transition elementsException
- if the transition can not handle the current statesprotected abstract String getXMLTag()
fromXML(StringBuffer)
,
toXML()
public StringBuffer toXML()
Storable
StringBuffer
of an
instance of the implementing class.protected void fromXML(StringBuffer xml) throws NonParsableException
AbstractHMM(StringBuffer)
constructor for creating an instance from an XML representation.
This method should never be made public
.fromXML
in class AbstractTrainableStatisticalModel
xml
- the XML representationNonParsableException
- if the XML representation can not be parsed properlyAbstractTrainableStatisticalModel.AbstractTrainableStatisticalModel(StringBuffer)
protected abstract void appendFurtherInformation(StringBuffer xml)
xml
- the XML representationprotected abstract void extractFurtherInformation(StringBuffer xml) throws NonParsableException
xml
- the XML representationNonParsableException
- if the information could not be reconstructed out of the StringBuffer
xml
public AbstractHMM clone() throws CloneNotSupportedException
AbstractTrainableStatisticalModel
Object
's clone()
-method.clone
in interface SequenceScore
clone
in interface TrainableStatisticalModel
clone
in class AbstractTrainableStatisticalModel
AbstractTrainableStatisticalModel
(the member-AlphabetContainer
isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X
directly inherited from
AbstractTrainableStatisticalModel
. Hence X
's
clone()
-method should work as:Object o = (X)super.clone();
o
defined by
X
that are not of simple data-types like
int
, double
, ... have to be deeply
copied return o
CloneNotSupportedException
- if something went wrong while cloningprotected abstract void createStates()
protected abstract void fillFwdMatrix(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequenceException
- if some error occurs during the computationprotected abstract void fillBwdMatrix(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequenceException
- if some error occurs during the computationpublic int getNumberOfThreads()
public String getGraphvizRepresentation(NumberFormat nf)
String
representation of the structure that
can be used in Graphviz to create an image.nf
- an instance of NumberFormat
for formating the probabilities of the transitionString
representation of the structuregetGraphvizRepresentation(NumberFormat, DataSet, double[], boolean)
public String getGraphvizRepresentation(NumberFormat nf, boolean sameTypeSameRank)
String
representation of the structure that
can be used in Graphviz to create an image.nf
- an instance of NumberFormat
for formating the probabilities of the transitionsameTypeSameRank
- if true
, states of the same type, i.e., having the same type of emission, are displayed on the same rankString
representation of the structuregetGraphvizRepresentation(NumberFormat, DataSet, double[], boolean)
public String getGraphvizRepresentation(NumberFormat nf, DataSet data, double[] weight, boolean sameTypeSameRank)
String
representation of the structure that
can be used in Graphviz to create an image.nf
- an instance of NumberFormat
for formating the probabilities of the transitiondata
- the data to determine the state posterior; can be null
weight
- the weights to weight the determined state posterior; can be null
sameTypeSameRank
- if true
, states of the same type, i.e., having the same type of emission, are displayed on the same rankString
representation of the structurepublic String getGraphvizRepresentation(NumberFormat nf, DataSet data, double[] weight, HashMap<String,String> rankPatterns)
String
representation of the structure that
can be used in Graphviz to create an image.nf
- an instance of NumberFormat
for formating the probabilities of the transitiondata
- the data to determine the state posterior; can be null
weight
- the weights to weight the determined state posterior; can be null
rankPatterns
- a HashMap
contain regular expressions and their corresponding value for the option rank
in GraphvizString
representation of the structureHMMFactory.getHashMap()
protected double[][] createMatrixForStatePosterior(int startPos, int endPos)
startPos
- the start positionendPos
- the end positiongetLogStatePosteriorMatrixFor(int, int, Sequence)
,
fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean)
protected abstract void fillLogStatePosteriorMatrix(double[][] statePosterior, int startPos, int endPos, Sequence seq, boolean silentZero) throws Exception
seq
in a given matrix.statePosterior
- the matrix for the log state posteriorstartPos
- the start positionendPos
- the end positionseq
- the sequencesilentZero
- true
if the state posterior for silent states is defined to be zero, otherwise false
Exception
- if an error occurs during the computationgetLogStatePosteriorMatrixFor(int, int, Sequence)
,
createMatrixForStatePosterior(int, int)
public double[][] getLogStatePosteriorMatrixFor(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position within the sequenceendPos
- the end position within the sequenceseq
- the sequenceException
- if the state posterior could not be computed, for instance if the model is not trained, ...protected double[][] getFinalStatePosterioriMatrix(double[][] intermediate)
fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean)
is used with silentZero==true
to eliminate the first row.intermediate
- the intermediate (log) state posterior matrix containing one additional row for silent states before the first emissionpublic double[][] getStatePosteriorMatrixFor(Sequence seq) throws Exception
seq
- the sequenceException
- if the state posterior could not be computed, for instance if the model is not trained, ...getLogStatePosteriorMatrixFor(int, int, Sequence)
public double[][][] getLogStatePosteriorMatrixFor(DataSet data) throws Exception
data
.data
- the sequencesException
- if the state posterior could not be computed, for instance if the model is not trained, ...getLogStatePosteriorMatrixFor(int, int, Sequence)
public double[][][] getStatePosteriorMatrixFor(DataSet data) throws Exception
data
.data
- the sequencesException
- if the state posterior could not be computed, for instance if the model is not trained, ...getStatePosteriorMatrixFor(Sequence)
public abstract Pair<IntList,Double> getViterbiPathFor(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position within the sequenceendPos
- the end position within the sequenceseq
- the sequencePair
containing the viterbi state path and the corresponding scoreException
- if the viterbi path could not be computed, for instance if the model is not trained, ...public Pair<IntList,Double> getViterbiPathFor(Sequence seq) throws Exception
seq
- the sequencePair
containing the viterbi state path and the corresponding scoreException
- if the viterbi path could not be computed, for instance if the model is not trained, ...getViterbiPathFor(int, int, Sequence)
public Pair<IntList,Double>[] getViterbiPathsFor(DataSet data) throws Exception
data
.data
- the sequencesException
- if the viterbi paths and scores could not be computed, for instance if the model is not trained, ...getViterbiPathFor(Sequence)
public final String[] decodePath(IntList path)
path
- the path in integer representationgetViterbiPathFor(Sequence)
,
getViterbiPathFor(int, int, Sequence)
public abstract double getLogProbForPath(IntList path, int startPos, Sequence seq) throws Exception
path
- the given state pathstartPos
- the start position within the sequence(s) (inclusive)seq
- the sequence(s)Exception
- if the probability for the sequence given path could not be computed, for instance if the model is not trained, ...protected abstract void createHelperVariables()
protected void provideMatrix(int type, int length)
createHelperVariables()
and provides the matrix with given type. Type 0 stands for fwdMatrix
, and type 1 stands for bwdMatrix
.type
- the type of the matrixlength
- the maximal sequence lengthpublic int getNumberOfStates()
public double getLogProbFor(Sequence sequence, int startpos, int endpos) throws Exception
StatisticalModel
StatisticalModel.getLogProbFor(Sequence, int)
by the fact, that the model could be
e.g. homogeneous and therefore the length of the sequences, whose
probability should be returned, is not fixed. Additionally, the end
position of the part of the given sequence is given and the probability
of the part from position startpos
to endpos
(inclusive) should be returned.
length
and the alphabets
define the type of
data that can be modeled and therefore both has to be checked.getLogProbFor
in interface StatisticalModel
sequence
- the given sequencestartpos
- the start position within the given sequenceendpos
- the last position to be taken into accountException
- if the sequence could not be handled (e.g.
startpos >
, endpos
> sequence.length
, ...) by the modelNotTrainedException
- if the model is not trained yetprotected static RuntimeException getRunTimeException(Exception e)
RuntimeException
from any other Exception
e
- the Exception
RuntimeException
protected double logProb(int startpos, int endpos, Sequence sequence) throws Exception
AlphabetContainer
and possible further features
before starting the computation.public void train(DataSet data) throws Exception
TrainableStatisticalModel
TrainableStatisticalModel
object given the data as DataSet
. train(data1)
; train(data2)
should be a fully trained model over data2
and not over
data1+data2
. All parameters of the model were given by the
call of the constructor.train
in interface TrainableStatisticalModel
train
in class AbstractTrainableStatisticalModel
data
- the given sequences as DataSet
Exception
- if the training did not succeedDataSet.getElementAt(int)
,
DataSet.ElementEnumerator
public final void setOutputStream(OutputStream o)
OutputStream
that is used e.g. for writing information
while training. It is possible to set o=null
, than nothing
will be written.o
- the OutputStream
protected void finalize() throws Throwable
protected void determineFinalStates()
finalState
public static int[][] decodeStatePosterior(double[][]... statePosterior)
statePosterior
- the (log) state posterior(s)getLogStatePosteriorMatrixFor(int, int, Sequence)
public String toString(NumberFormat nf)
SequenceScore
String
representation of the instance.toString
in interface SequenceScore
nf
- the NumberFormat
for the String
representation of parameters or probabilitiesString
representation of the instance