|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM
public abstract class AbstractHMM
This class is the super class of all implementations hidden Markov models (HMMs) in Jstacs.
The training algorithm of the the HMM is determined by a specialized ParameterSet
denoted as HMMTrainingParameterSet.
For creating frequently used HMMs please check HMMFactory.
State,
Transition,
HMMFactory| Field Summary | |
|---|---|
protected double[][] |
bwdMatrix
matrix for all backward-computed variables; bwdMatrix[l][c] = log P(x_{l+1},...,x_L | (s_{l-order+1},...,s_l)=c , parameter) |
protected Emission[] |
emission
The emissions used in the states. |
protected int[] |
emissionIdx
The index of the used emission of each state. |
protected boolean[] |
finalState
An array of switches that contains for each state whether is is a final state or not (cf. |
protected boolean[] |
forward
An array of switches that contains for each state whether the emission is forward or the reverse strand. |
protected double[][] |
fwdMatrix
matrix for all forward-computed variables; fwdMatrix[l][c] = log P(x_1,...,x_l,(s_{l-order+1},...,s_l)=c | parameter) |
protected String[] |
name
The names of the states. |
protected SafeOutputStream |
sostream
This is the stream for writing information while training. |
static String |
START_NODE
The String for the start node used in Graphviz annotation. |
protected State[] |
states
The (hidden) states of the HMM. |
protected int |
threads
The number of threads that is internally used. |
protected HMMTrainingParameterSet |
trainingParameter
The ParameterSet containing all Parameters for the training of the HMM. |
protected Transition |
transition
The transitions between all (hidden) states of the HMM. |
| Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
|---|
alphabets, length |
| Constructor Summary | |
|---|---|
protected |
AbstractHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
Emission[] emission)
This is the main constructor for an HMM. |
protected |
AbstractHMM(StringBuffer xml)
The standard constructor for the interface Storable. |
| Method Summary | |
|---|---|
protected abstract void |
appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation. |
AbstractHMM |
clone()
Follows the conventions of Object's clone()-method. |
protected abstract void |
createHelperVariables()
This method instantiates all helper variables that are need inside the model for instance for filling forward and backward matrix, ... |
protected double[][] |
createMatrixForStatePosterior(int startPos,
int endPos)
This method creates an empty matrix for the log state posterior. |
protected abstract void |
createStates()
This method creates states for the internal usage. |
String[] |
decodePath(IntList path)
This method decodes any path of the HMM, i.e. |
static int[][] |
decodeStatePosterior(double[][]... statePosterior)
The method returns the decoded state posterior, i.e. |
protected void |
determineFinalStates()
This method determines the final states of the HMM. |
protected abstract void |
extractFurtherInformation(StringBuffer xml)
This method extracts further information from the XML representation. |
protected abstract void |
fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the backward-matrix for a given sequence. |
protected abstract void |
fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the forward-matrix for a given sequence. |
protected abstract void |
fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
This method fills the log state posterior of Sequence seq in a given matrix. |
protected void |
finalize()
|
protected void |
fromXML(StringBuffer xml)
This method is used by the AbstractHMM(StringBuffer) constructor for creating an instance from an XML representation. |
protected double[][] |
getFinalStatePosterioriMatrix(double[][] intermediate)
This method is used if fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean) is used with code>silentZero==true
to eliminate the first row. |
String |
getGraphvizRepresentation(NumberFormat nf)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
boolean sameTypeSameRank)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
boolean sameTypeSameRank)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
HashMap<String,String> rankPatterns)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
double |
getLogProbFor(Sequence sequence,
int startpos,
int endpos)
Returns the logarithm of the probability of (a part of) the given sequence given the model. |
abstract double |
getLogProbForPath(IntList path,
int startPos,
Sequence seq)
|
double[][][] |
getLogStatePosteriorMatrixFor(DataSet data)
This method returns the log state posteriors for all sequences of the data set data. |
double[][] |
getLogStatePosteriorMatrixFor(int startPos,
int endPos,
Sequence seq)
This method returns the log state posterior of all states for a sequence. |
int |
getNumberOfStates()
This method returns the number of the (hidden) states |
int |
getNumberOfThreads()
This method returns the number of threads that is internally used. |
protected static RuntimeException |
getRunTimeException(Exception e)
This method creates an RuntimeException from any other Exception |
double[][][] |
getStatePosteriorMatrixFor(DataSet data)
This method returns the state posteriors for all sequences of the data set data. |
double[][] |
getStatePosteriorMatrixFor(Sequence seq)
This method returns the log state posterior of all states for a sequence. |
abstract Pair<IntList,Double> |
getViterbiPathFor(int startPos,
int endPos,
Sequence seq)
|
Pair<IntList,Double> |
getViterbiPathFor(Sequence seq)
|
Pair<IntList,Double>[] |
getViterbiPathsFor(DataSet data)
This method returns the viterbi paths and scores for all sequences of the data set data. |
protected abstract String |
getXMLTag()
Returns the tag for the XML representation. |
protected void |
initTransition(BasicHigherOrderTransition.AbstractTransitionElement... te)
This method creates the internal transition. |
protected double |
logProb(int startpos,
int endpos,
Sequence sequence)
This method computes the logarithm of the probability of the corresponding subsequences. |
protected void |
provideMatrix(int type,
int length)
This method invokes the method createHelperVariables() and provides the matrix with given type. |
void |
setOutputStream(OutputStream o)
Sets the OutputStream that is used e.g. |
String |
toString(NumberFormat nf)
This method returns a String representation of the instance. |
StringBuffer |
toXML()
This method returns an XML representation as StringBuffer of an
instance of the implementing class. |
void |
train(DataSet data)
Trains the TrainableStatisticalModel object given the data as DataSet. |
| Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
|---|
check, emitDataSet, getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel |
|---|
train |
| Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.StatisticalModel |
|---|
getLogPriorTerm |
| Methods inherited from interface de.jstacs.sequenceScores.SequenceScore |
|---|
getInstanceName, getNumericalCharacteristics, isInitialized |
| Field Detail |
|---|
protected State[] states
protected String[] name
protected int[] emissionIdx
protected boolean[] forward
ComplementableDiscreteAlphabetprotected Emission[] emission
protected Transition transition
protected double[][] fwdMatrix
protected double[][] bwdMatrix
protected HMMTrainingParameterSet trainingParameter
ParameterSet containing all Parameters for the training of the HMM.
protected SafeOutputStream sostream
protected boolean[] finalState
protected int threads
public static final String START_NODE
String for the start node used in Graphviz annotation.
getGraphvizRepresentation(NumberFormat),
Constant Field Values| Constructor Detail |
|---|
protected AbstractHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
Emission[] emission)
throws CloneNotSupportedException,
WrongAlphabetException
trainingParameterSet - a ParameterSet containing all Parameters for the training of the HMMname - the names of the statesemissionIdx - the indices of the emissions that should be used for each state, if null state i will use emission iforward - a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used,
if null all states use the forward strandemission - the emissions
CloneNotSupportedException - if trainingParameterSet can not be cloned
WrongAlphabetException - if not all (non-silent) emissions have use the same AlphabetContainer
protected AbstractHMM(StringBuffer xml)
throws NonParsableException
Storable.
Constructs a AbstractHMM out of an XML representation.
xml - the XML representation as StringBuffer
NonParsableException - if the AbstractHMM could not be reconstructed out of
the StringBuffer xml| Method Detail |
|---|
protected void initTransition(BasicHigherOrderTransition.AbstractTransitionElement... te)
throws Exception
te - the individual transition elements
Exception - if the transition can not handle the current statesprotected abstract String getXMLTag()
fromXML(StringBuffer),
toXML()public StringBuffer toXML()
StorableStringBuffer of an
instance of the implementing class.
toXML in interface Storable
protected void fromXML(StringBuffer xml)
throws NonParsableException
AbstractHMM(StringBuffer) constructor for creating an instance from an XML representation.
This method should never be made public.
fromXML in class AbstractTrainableStatisticalModelxml - the XML representation
NonParsableException - if the XML representation can not be parsed properlyAbstractTrainableStatisticalModel.AbstractTrainableStatisticalModel(StringBuffer)protected abstract void appendFurtherInformation(StringBuffer xml)
xml - the XML representation
protected abstract void extractFurtherInformation(StringBuffer xml)
throws NonParsableException
xml - the XML representation
NonParsableException - if the information could not be reconstructed out of the StringBuffer xml
public AbstractHMM clone()
throws CloneNotSupportedException
AbstractTrainableStatisticalModelObject's clone()-method.
clone in interface SequenceScoreclone in interface TrainableStatisticalModelclone in class AbstractTrainableStatisticalModelAbstractTrainableStatisticalModel
(the member-AlphabetContainer isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X directly inherited from
AbstractTrainableStatisticalModel. Hence X's
clone()-method should work as:Object o = (X)super.clone(); o defined by
X that are not of simple data-types like
int, double, ... have to be deeply
copied return o
CloneNotSupportedException - if something went wrong while cloningprotected abstract void createStates()
protected abstract void fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
throws Exception
startPos - the start position (inclusive) in the sequenceendPos - the end position (inclusive) in the sequenceseq - the sequence
Exception - if some error occurs during the computation
protected abstract void fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
throws Exception
startPos - the start position (inclusive) in the sequenceendPos - the end position (inclusive) in the sequenceseq - the sequence
Exception - if some error occurs during the computationpublic int getNumberOfThreads()
public String getGraphvizRepresentation(NumberFormat nf)
String representation of the structure that
can be used in Graphviz to create an image.
nf - an instance of NumberFormat for formating the probabilities of the transition
String representation of the structuregetGraphvizRepresentation(NumberFormat, DataSet, double[], boolean)
public String getGraphvizRepresentation(NumberFormat nf,
boolean sameTypeSameRank)
String representation of the structure that
can be used in Graphviz to create an image.
nf - an instance of NumberFormat for formating the probabilities of the transitionsameTypeSameRank - if true, states of the same type, i.e., having the same type of emission, are displayed on the same rank
String representation of the structuregetGraphvizRepresentation(NumberFormat, DataSet, double[], boolean)
public String getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
boolean sameTypeSameRank)
String representation of the structure that
can be used in Graphviz to create an image.
nf - an instance of NumberFormat for formating the probabilities of the transitiondata - the data to determine the state posterior; can be nullweight - the weights to weight the determined state posterior; can be nullsameTypeSameRank - if true, states of the same type, i.e., having the same type of emission, are displayed on the same rank
String representation of the structure
public String getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
HashMap<String,String> rankPatterns)
String representation of the structure that
can be used in Graphviz to create an image.
nf - an instance of NumberFormat for formating the probabilities of the transitiondata - the data to determine the state posterior; can be nullweight - the weights to weight the determined state posterior; can be nullrankPatterns - a HashMap contain regular expressions and their corresponding value for the option rank in Graphviz
String representation of the structureHMMFactory.getHashMap()
protected double[][] createMatrixForStatePosterior(int startPos,
int endPos)
startPos - the start positionendPos - the end position
getLogStatePosteriorMatrixFor(int, int, Sequence),
fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean)
protected abstract void fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
throws Exception
seq in a given matrix.
statePosterior - the matrix for the log state posteriorstartPos - the start positionendPos - the end positionseq - the sequencesilentZero - true if the state posterior for silent states is defined to be zero, otherwise false
Exception - if an error occurs during the computationgetLogStatePosteriorMatrixFor(int, int, Sequence),
createMatrixForStatePosterior(int, int)
public double[][] getLogStatePosteriorMatrixFor(int startPos,
int endPos,
Sequence seq)
throws Exception
startPos - the start position within the sequenceendPos - the end position within the sequenceseq - the sequence
Exception - if the state posterior could not be computed, for instance if the model is not trained, ...protected double[][] getFinalStatePosterioriMatrix(double[][] intermediate)
fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean) is used with code>silentZero==true
to eliminate the first row.
intermediate - the intermediate (log) state posterior matrix containing one additional row for silent states before the first emission
public double[][] getStatePosteriorMatrixFor(Sequence seq)
throws Exception
seq - the sequence
Exception - if the state posterior could not be computed, for instance if the model is not trained, ...getLogStatePosteriorMatrixFor(int, int, Sequence)
public double[][][] getLogStatePosteriorMatrixFor(DataSet data)
throws Exception
data.
data - the sequences
Exception - if the state posterior could not be computed, for instance if the model is not trained, ...getLogStatePosteriorMatrixFor(int, int, Sequence)
public double[][][] getStatePosteriorMatrixFor(DataSet data)
throws Exception
data.
data - the sequences
Exception - if the state posterior could not be computed, for instance if the model is not trained, ...getStatePosteriorMatrixFor(Sequence)
public abstract Pair<IntList,Double> getViterbiPathFor(int startPos,
int endPos,
Sequence seq)
throws Exception
startPos - the start position within the sequenceendPos - the end position within the sequenceseq - the sequence
Pair containing the viterbi state path and the corresponding score
Exception - if the viterbi path could not be computed, for instance if the model is not trained, ...
public Pair<IntList,Double> getViterbiPathFor(Sequence seq)
throws Exception
seq - the sequence
Pair containing the viterbi state path and the corresponding score
Exception - if the viterbi path could not be computed, for instance if the model is not trained, ...getViterbiPathFor(int, int, Sequence)
public Pair<IntList,Double>[] getViterbiPathsFor(DataSet data)
throws Exception
data.
data - the sequences
Exception - if the viterbi paths and scores could not be computed, for instance if the model is not trained, ...getViterbiPathFor(Sequence)public final String[] decodePath(IntList path)
path - the path in integer representation
getViterbiPathFor(Sequence),
getViterbiPathFor(int, int, Sequence)
public abstract double getLogProbForPath(IntList path,
int startPos,
Sequence seq)
throws Exception
path - the given state pathstartPos - the start position within the sequence(s) (inclusive)seq - the sequence(s)
Exception - if the probability for the sequence given path could not be computed, for instance if the model is not trained, ...protected abstract void createHelperVariables()
protected void provideMatrix(int type,
int length)
createHelperVariables() and provides the matrix with given type. Type 0 stands for fwdMatrix, and type 1 stands for bwdMatrix.
type - the type of the matrixlength - the maximal sequence lengthpublic int getNumberOfStates()
public double getLogProbFor(Sequence sequence,
int startpos,
int endpos)
throws Exception
StatisticalModelStatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be
e.g. homogeneous and therefore the length of the sequences, whose
probability should be returned, is not fixed. Additionally, the end
position of the part of the given sequence is given and the probability
of the part from position startpos to endpos
(inclusive) should be returned.
length and the alphabets define the type of
data that can be modeled and therefore both has to be checked.
getLogProbFor in interface StatisticalModelsequence - the given sequencestartpos - the start position within the given sequenceendpos - the last position to be taken into account
Exception - if the sequence could not be handled (e.g.
startpos > , endpos
> sequence.length, ...) by the model
NotTrainedException - if the model is not trained yetprotected static RuntimeException getRunTimeException(Exception e)
RuntimeException from any other Exception
e - the Exception
RuntimeException
protected double logProb(int startpos,
int endpos,
Sequence sequence)
throws Exception
AlphabetContainer and possible further features
before starting the computation.
startpos - the start position (inclusive)endpos - the end position (inclusive)sequence - the Sequence(s)
Exception - if the model has no parameters (for instance if it is not trained)
public void train(DataSet data)
throws Exception
TrainableStatisticalModelTrainableStatisticalModel object given the data as DataSet. train(data1); train(data2)
should be a fully trained model over data2 and not over
data1+data2. All parameters of the model were given by the
call of the constructor.
train in interface TrainableStatisticalModeltrain in class AbstractTrainableStatisticalModeldata - the given sequences as DataSet
Exception - if the training did not succeedDataSet.getElementAt(int),
DataSet.ElementEnumeratorpublic final void setOutputStream(OutputStream o)
OutputStream that is used e.g. for writing information
while training. It is possible to set o=null, than nothing
will be written.
o - the OutputStream
protected void finalize()
throws Throwable
finalize in class ObjectThrowableprotected void determineFinalStates()
finalStatepublic static int[][] decodeStatePosterior(double[][]... statePosterior)
statePosterior - the (log) state posterior(s)
getLogStatePosteriorMatrixFor(int, int, Sequence)public String toString(NumberFormat nf)
SequenceScoreString representation of the instance.
toString in interface SequenceScorenf - the NumberFormat for the String representation of parameters or probabilities
String representation of the instance
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||