public class DifferentiableHigherOrderHMM extends HigherOrderHMM implements SamplingDifferentiableStatisticalModel
HigherOrderHMM and a DifferentiableStatisticalModel by implementing some of the declared methods.HigherOrderHMM.Type| Modifier and Type | Field and Description |
|---|---|
protected double |
ess
The equivalent sample size used for the prior
|
protected double[][][] |
gradient
Help array for the gradient
|
protected int[][] |
index
Index array used for computing the gradient
|
protected IntList[] |
indicesState
Help array for the indexes of the parameters of the states
|
protected IntList[] |
indicesTransition
Help array for the indexes of the parameters of the transition
|
protected int |
numberOfParameters
The number of parameters of this HMM
|
protected DoubleList[] |
partDerState
Help array for the derivatives of the parameters of the states
|
protected DoubleList[] |
partDerTransition
Help array for the derivatives of the parameters of the transition
|
protected HigherOrderHMM.Type |
score
The type of the score that is evaluated
|
backwardIntermediate, container, logEmission, numberOfSummands, skipInit, stateListbwdMatrix, emission, emissionIdx, finalState, forward, fwdMatrix, name, sostream, START_NODE, states, threads, trainingParameter, transitionalphabets, lengthUNKNOWN| Constructor and Description |
|---|
DifferentiableHigherOrderHMM(MaxHMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
DifferentiableEmission[] emission,
boolean likelihood,
double ess,
TransitionElement... te)
This is the main constructor.
|
DifferentiableHigherOrderHMM(StringBuffer xml)
The standard constructor for the interface
Storable. |
| Modifier and Type | Method and Description |
|---|---|
void |
addGradientOfLogPriorTerm(double[] grad,
int start)
This method computes the gradient of
DifferentiableStatisticalModel.getLogPriorTerm() for each
parameter of this model. |
protected void |
appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation.
|
DifferentiableHigherOrderHMM |
clone()
Follows the conventions of
Object's clone()-method. |
protected void |
createHelperVariables()
This method instantiates all helper variables that are need inside the model for instance for filling forward and backward matrix, ...
|
protected void |
createStates()
This method creates states for the internal usage.
|
protected void |
extractFurtherInformation(StringBuffer xml)
This method extracts further information from the XML representation.
|
double[] |
getCurrentParameterValues()
Returns a
double array of dimension
DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values. |
double |
getESS()
Returns the equivalent sample size (ess) of this model, i.e.
|
double |
getInitialClassParam(double classProb)
Returns the initial class parameter for the class this
DifferentiableSequenceScore is responsible for, based on the class
probability classProb. |
String |
getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ...
|
double |
getLogNormalizationConstant()
Returns the logarithm of the sum of the scores over all sequences of the event space.
|
double |
getLogPartialNormalizationConstant(int parameterIndex)
Returns the logarithm of the partial normalization constant for the parameter with index
parameterIndex. |
double |
getLogScoreAndPartialDerivation(Sequence seq,
int startPos,
int endPos,
IntList indices,
DoubleList partialDer)
|
double |
getLogScoreAndPartialDerivation(Sequence seq,
int startPos,
IntList indices,
DoubleList partialDer)
|
double |
getLogScoreAndPartialDerivation(Sequence seq,
IntList indices,
DoubleList partialDer)
Returns the logarithmic score for a
Sequence seq and
fills lists with the indices and the partial derivations. |
double |
getLogScoreFor(Sequence seq)
Returns the logarithmic score for the
Sequence seq. |
double |
getLogScoreFor(Sequence seq,
int start)
|
double |
getLogScoreFor(Sequence seq,
int start,
int end)
|
int |
getNumberOfParameters()
Returns the number of parameters in this
DifferentiableSequenceScore. |
int |
getNumberOfRecommendedStarts()
This method returns the number of recommended optimization starts.
|
int[][] |
getSamplingGroups(int parameterOffset)
Returns groups of indexes of parameters that shall be drawn
together in a sampling procedure
|
int |
getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Returns the size of the event space of the random variables that are
affected by parameter no.
|
void |
initializeFunction(int index,
boolean freeParams,
DataSet[] data,
double[][] weights)
This method creates the underlying structure of the
DifferentiableSequenceScore. |
void |
initializeFunctionRandomly(boolean freeParams)
This method initializes the
DifferentiableSequenceScore randomly. |
boolean |
isInitialized()
This method can be used to determine whether the instance is initialized.
|
boolean |
isNormalized()
This method indicates whether the implemented score is already normalized
to 1 or not.
|
protected double |
logProb(int startpos,
int endpos,
Sequence sequence)
This method computes the logarithm of the probability of the corresponding subsequences.
|
void |
setParameters(double[] params,
int start)
This method sets the internal parameters to the values of
params between start and
start + |
void |
train(DataSet data,
double[] weights)
Trains the
TrainableStatisticalModel object given the data as DataSet using
the specified weights. |
baumWelch, estimateFromStatistics, fillBwdMatrix, fillBwdOrViterbiMatrix, fillFwdMatrix, fillLogStatePosteriorMatrix, finalize, getCharacteristics, getEmissionIndexes, getEmissions, getLogPriorTerm, getLogProbForPath, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, getNames, getNumericalCharacteristics, getTrainingParams, getTransisionElements, getViterbiPathFor, getXMLTag, initialize, initializeRandomly, resetStatistics, samplePath, setSkiptInit, viterbicreateMatrixForStatePosterior, decodePath, decodeStatePosterior, determineFinalStates, fromXML, getFinalStatePosterioriMatrix, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getLogProbFor, getLogStatePosteriorMatrixFor, getLogStatePosteriorMatrixFor, getNumberOfStates, getNumberOfThreads, getRunTimeException, getStatePosteriorMatrixFor, getStatePosteriorMatrixFor, getViterbiPathFor, getViterbiPathsFor, initTransition, provideMatrix, setOutputStream, toString, toXML, traincheck, emitDataSet, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, toStringequals, getClass, hashCode, notify, notifyAll, wait, wait, waitgetLogPriorTermemitDataSet, getLogProbFor, getLogProbFor, getLogProbFor, getMaximalMarkovOrdergetAlphabetContainer, getCharacteristics, getLength, getLogScoreFor, getLogScoreFor, getNumericalCharacteristics, toStringprotected int numberOfParameters
protected double ess
protected HigherOrderHMM.Type score
protected int[][] index
protected double[][][] gradient
protected IntList[] indicesState
protected IntList[] indicesTransition
protected DoubleList[] partDerState
protected DoubleList[] partDerTransition
public DifferentiableHigherOrderHMM(MaxHMMTrainingParameterSet trainingParameterSet, String[] name, int[] emissionIdx, boolean[] forward, DifferentiableEmission[] emission, boolean likelihood, double ess, TransitionElement... te) throws Exception
trainingParameterSet - the ParameterSet that determines the training algorithm and contains the necessary Parametersname - the names of the statesemissionIdx - the indices of the emissions that should be used for each state, if null state i will use emission iforward - a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used,
if null all states use the forward strandemission - the emissionslikelihood - if true the likelihood is return by getLogScoreFor(Sequence) otherwise the viterbi scoreess - the ess of the modelte - the TransitionElements used for creating a TransitionException - if
name, emissionIdx, or forward is not equal to the number of statesAlphabetContainerpublic DifferentiableHigherOrderHMM(StringBuffer xml) throws NonParsableException
Storable.
Constructs an DifferentiableHigherOrderHMM out of an XML representation.xml - the XML representation as StringBufferNonParsableException - if the DifferentiableHigherOrderHMM could not be reconstructed out of
the StringBuffer xmlprotected void appendFurtherInformation(StringBuffer xml)
AbstractHMMappendFurtherInformation in class HigherOrderHMMxml - the XML representationprotected void extractFurtherInformation(StringBuffer xml) throws NonParsableException
HigherOrderHMMextractFurtherInformation in class HigherOrderHMMxml - the XML representationNonParsableException - if the information could not be reconstructed out of the StringBuffer xmlprotected void createHelperVariables()
AbstractHMMcreateHelperVariables in class HigherOrderHMMprotected void createStates()
AbstractHMMcreateStates in class HigherOrderHMMpublic DifferentiableHigherOrderHMM clone() throws CloneNotSupportedException
AbstractTrainableStatisticalModelObject's clone()-method.clone in interface DifferentiableSequenceScoreclone in interface SequenceScoreclone in interface TrainableStatisticalModelclone in class HigherOrderHMMAbstractTrainableStatisticalModel
(the member-AlphabetContainer isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X directly inherited from
AbstractTrainableStatisticalModel. Hence X's
clone()-method should work as:Object o = (X)super.clone(); o defined by
X that are not of simple data-types like
int, double, ... have to be deeply
copied return oCloneNotSupportedException - if something went wrong while cloningpublic double getESS()
DifferentiableStatisticalModelgetESS in interface DifferentiableStatisticalModelpublic void addGradientOfLogPriorTerm(double[] grad,
int start)
throws Exception
DifferentiableStatisticalModelDifferentiableStatisticalModel.getLogPriorTerm() for each
parameter of this model. The results are added to the array
grad beginning at index start.addGradientOfLogPriorTerm in interface DifferentiableStatisticalModelgrad - the array of gradientsstart - the start index in the grad array, where the
partial derivations for the parameters of this models shall be
enteredException - if something went wrong with the computing of the gradientsDifferentiableStatisticalModel.getLogPriorTerm()public int getNumberOfParameters()
DifferentiableSequenceScoreDifferentiableSequenceScore. If the
number of parameters is not known yet, the method returns
DifferentiableSequenceScore.UNKNOWN.getNumberOfParameters in interface DifferentiableSequenceScoreDifferentiableSequenceScoreDifferentiableSequenceScore.UNKNOWNpublic int getNumberOfRecommendedStarts()
DifferentiableSequenceScoregetNumberOfRecommendedStarts in interface DifferentiableSequenceScorepublic double[] getCurrentParameterValues()
throws Exception
DifferentiableSequenceScoredouble array of dimension
DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values.
If one likes to use these parameters to start an optimization it is
highly recommended to invoke
DifferentiableSequenceScore.initializeFunction(int, boolean, DataSet[], double[][]) before.
After an optimization this method can be used to get the current
parameter values.getCurrentParameterValues in interface DifferentiableSequenceScoreException - if no parameters exist (yet)public boolean isInitialized()
SequenceScoreSequenceScore.getLogScoreFor(Sequence).isInitialized in interface SequenceScoreisInitialized in class HigherOrderHMMtrue if the instance is initialized, false
otherwisepublic void setParameters(double[] params,
int start)
DifferentiableSequenceScoreparams between start and
start + DifferentiableSequenceScore.getNumberOfParameters() - 1setParameters in interface DifferentiableSequenceScoreparams - the new parametersstart - the start index in paramspublic void initializeFunctionRandomly(boolean freeParams)
throws Exception
DifferentiableSequenceScoreDifferentiableSequenceScore randomly. It has to
create the underlying structure of the DifferentiableSequenceScore.initializeFunctionRandomly in interface DifferentiableSequenceScorefreeParams - indicates whether the (reduced) parameterization is usedException - if something went wrongpublic void initializeFunction(int index,
boolean freeParams,
DataSet[] data,
double[][] weights)
throws Exception
DifferentiableSequenceScoreDifferentiableSequenceScore.initializeFunction in interface DifferentiableSequenceScoreindex - the index of the class the DifferentiableSequenceScore modelsfreeParams - indicates whether the (reduced) parameterization is useddata - the data setsweights - the weights of the sequences in the data setsException - if something went wrongpublic void train(DataSet data, double[] weights) throws Exception
TrainableStatisticalModelTrainableStatisticalModel object given the data as DataSet using
the specified weights. The weight at position i belongs to the element at
position i. So the array weight should have the number of
sequences in the data set as dimension. (Optionally it is possible to use
weight == null if all weights have the value one.)train(data1); train(data2)
should be a fully trained model over data2 and not over
data1+data2. All parameters of the model were given by the
call of the constructor.train in interface TrainableStatisticalModeltrain in class HigherOrderHMMdata - the given sequences as DataSetweights - the weights of the elements, each weight should be
non-negativeException - if the training did not succeed (e.g. the dimension of
weights and the number of sequences in the
data set do not match)DataSet.getElementAt(int),
DataSet.ElementEnumeratorpublic boolean isNormalized()
DifferentiableStatisticalModelfalse.isNormalized in interface DifferentiableStatisticalModeltrue if the implemented score is already normalized
to 1, false otherwisepublic double getLogNormalizationConstant()
DifferentiableStatisticalModelgetLogNormalizationConstant in interface DifferentiableStatisticalModelpublic double getLogPartialNormalizationConstant(int parameterIndex)
throws Exception
DifferentiableStatisticalModelparameterIndex. This is the logarithm of the partial derivation of the
normalization constant for the parameter with index
parameterIndex,
![\[\log \frac{\partial Z(\underline{\lambda})}{\partial \lambda_{parameterindex}}\]](images/DifferentiableStatisticalModel_LaTeXilb10_1.png)
getLogPartialNormalizationConstant in interface DifferentiableStatisticalModelparameterIndex - the index of the parameterException - if something went wrong with the normalizationDifferentiableStatisticalModel.getLogNormalizationConstant()public double getInitialClassParam(double classProb)
DifferentiableSequenceScoreDifferentiableSequenceScore is responsible for, based on the class
probability classProb.getInitialClassParam in interface DifferentiableSequenceScoreclassProb - the class probabilitypublic double getLogScoreFor(Sequence seq)
SequenceScoreSequence seq.getLogScoreFor in interface SequenceScoregetLogScoreFor in class AbstractTrainableStatisticalModelseq - the sequencepublic double getLogScoreFor(Sequence seq, int start)
SequenceScoregetLogScoreFor in interface SequenceScoregetLogScoreFor in class AbstractTrainableStatisticalModelseq - the Sequencestart - the start position in the SequenceSequencepublic double getLogScoreFor(Sequence seq, int start, int end)
SequenceScoregetLogScoreFor in interface SequenceScoregetLogScoreFor in class AbstractTrainableStatisticalModelseq - the Sequencestart - the start position in the Sequenceend - the end position (inclusive) in the SequenceSequenceprotected double logProb(int startpos,
int endpos,
Sequence sequence)
AbstractHMMAlphabetContainer and possible further features
before starting the computation.logProb in class AbstractHMMstartpos - the start position (inclusive)endpos - the end position (inclusive)sequence - the Sequence(s)public double getLogScoreAndPartialDerivation(Sequence seq, IntList indices, DoubleList partialDer)
DifferentiableSequenceScoreSequence seq and
fills lists with the indices and the partial derivations.getLogScoreAndPartialDerivation in interface DifferentiableSequenceScoreseq - the Sequenceindices - an IntList of indices, after method invocation the
list should contain the indices i where
is not zeropartialDer - a DoubleList of partial derivations, after method
invocation the list should contain the corresponding
that are not zeroSequencepublic double getLogScoreAndPartialDerivation(Sequence seq, int startPos, IntList indices, DoubleList partialDer)
DifferentiableSequenceScoreSequence beginning at
position start in the Sequence and fills lists with
the indices and the partial derivations.getLogScoreAndPartialDerivation in interface DifferentiableSequenceScoreseq - the SequencestartPos - the start position in the Sequenceindices - an IntList of indices, after method invocation the
list should contain the indices i where
is not zeropartialDer - a DoubleList of partial derivations, after method
invocation the list should contain the corresponding
that are not zeroSequencepublic double getLogScoreAndPartialDerivation(Sequence seq, int startPos, int endPos, IntList indices, DoubleList partialDer)
DifferentiableSequenceScoreSequence beginning at
position start in the Sequence and fills lists with
the indices and the partial derivations.getLogScoreAndPartialDerivation in interface DifferentiableSequenceScoreseq - the SequencestartPos - the start position in the SequenceendPos - the end position (inclusive) in the Sequenceindices - an IntList of indices, after method invocation the
list should contain the indices i where
is not zeropartialDer - a DoubleList of partial derivations, after method
invocation the list should contain the corresponding
that are not zeroSequencepublic int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
DifferentiableStatisticalModelindex, i.e. the product of the
sizes of the alphabets at the position of each random variable affected
by parameter index. For DNA alphabets this corresponds to 4
for a PWM, 16 for a WAM except position 0, ...getSizeOfEventSpaceForRandomVariablesOfParameter in interface DifferentiableStatisticalModelindex - the index of the parameterpublic int[][] getSamplingGroups(int parameterOffset)
SamplingDifferentiableStatisticalModelgetSamplingGroups in interface SamplingDifferentiableStatisticalModelparameterOffset - a global offset on the parameter indexesparameterOffset.public String getInstanceName()
SequenceScoregetInstanceName in interface SequenceScoregetInstanceName in class HigherOrderHMM