public class BayesianNetworkDiffSM extends AbstractDifferentiableStatisticalModel implements InstantiableFromParameterSet, QuickScanningSequenceScore
Measure, e.g. InhomogeneousMarkov for
Markov models of arbitrary order. ScoreClassifier, e.g. in a
MSPClassifier to learn
the parameters of the DifferentiableStatisticalModel
using maximum conditional likelihood or maximum supervised posterior.| Modifier and Type | Field and Description |
|---|---|
protected double |
ess
The equivalent sample size.
|
protected boolean |
isTrained
Indicates if the instance has been trained.
|
protected Double |
logNormalizationConstant
Normalization constant to obtain normalized probabilities.
|
protected Integer |
numFreePars
The number of free parameters.
|
protected int[] |
nums
Used internally.
|
protected int[][] |
order
The network structure, used internally.
|
protected BNDiffSMParameter[] |
parameters
The parameters of the scoring function.
|
protected boolean |
plugInParameters
Indicates if plug-in parameters, i.e.
|
protected Measure |
structureMeasure
Measure that defines the network structure. |
protected BNDiffSMParameterTree[] |
trees
The trees that represent the context of the random variable (i.e.
|
alphabets, length, rUNKNOWN| Constructor and Description |
|---|
BayesianNetworkDiffSM(AlphabetContainer alphabet,
int length,
double ess,
boolean plugInParameters,
Measure structureMeasure)
Creates a new
BayesianNetworkDiffSM that has neither
been initialized nor trained. |
BayesianNetworkDiffSM(BayesianNetworkDiffSMParameterSet parameters)
Creates a new
BayesianNetworkDiffSM that has neither
been initialized nor trained from a
BayesianNetworkDiffSMParameterSet. |
BayesianNetworkDiffSM(StringBuffer xml)
The standard constructor for the interface
Storable. |
| Modifier and Type | Method and Description |
|---|---|
void |
addGradientOfLogPriorTerm(double[] grad,
int start)
This method computes the gradient of
DifferentiableStatisticalModel.getLogPriorTerm() for each
parameter of this model. |
BayesianNetworkDiffSM |
clone()
Creates a clone (deep copy) of the current
DifferentiableSequenceScore
instance. |
protected void |
createTrees(DataSet[] data2,
double[][] weights2)
Creates the tree structures that represent the context (array
trees) and the parameter objects parameters using the
given Measure structureMeasure. |
DataSet |
emitDataSet(int numberOfSequences,
int... seqLength)
This method returns a
DataSet object containing artificial
sequence(s). |
void |
fillInfixScore(int[] seq,
int start,
int length,
double[] scores)
Computes the position-wise scores of an infix of the sequence
seq (which must be encoded by the
same alphabet as this QuickScanningSequenceScore) beginning at start and extending for length
positions. |
protected void |
fromXML(StringBuffer source)
This method is called in the constructor for the
Storable
interface to create a scoring function from a StringBuffer. |
InstanceParameterSet |
getCurrentParameterSet()
Returns the
InstanceParameterSet that has been used to
instantiate the current instance of the implementing class. |
double[] |
getCurrentParameterValues()
Returns a
double array of dimension
DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values. |
double |
getESS()
Returns the equivalent sample size (ess) of this model, i.e.
|
boolean[][] |
getInfixFilter(int kmer,
double thresh,
int... start)
Computes arrays that indicate, for a given set of starting positions and a given k-mer length, if a sequence
containing this k-mer may yield a score above
threshold, choosing the best-scoring option among
all non-specified positions (i.e., those outside the k-mer). |
String |
getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ...
|
double |
getLogNormalizationConstant()
Returns the logarithm of the sum of the scores over all sequences of the event space.
|
double |
getLogPartialNormalizationConstant(int parameterIndex)
Returns the logarithm of the partial normalization constant for the parameter with index
parameterIndex. |
double |
getLogPriorTerm()
This method computes a value that is proportional to
|
double |
getLogScoreAndPartialDerivation(Sequence seq,
int start,
IntList indices,
DoubleList partialDer)
|
double |
getLogScoreFor(Sequence seq,
int start)
|
byte |
getMaximalMarkovOrder()
This method returns the maximal used Markov order, if possible.
|
double |
getMaximumScore()
Returns the maximum score of this
BayesianNetworkDiffSM returned for a
admissible input sequence. |
int |
getNumberOfParameters()
Returns the number of parameters in this
DifferentiableSequenceScore. |
double[] |
getPositionDependentKMerProb(Sequence kmer)
Returns the probability of
kmer for all possible positions in this BayesianNetworkDiffSM starting at position kmer.getLength()-1. |
int |
getPositionForParameter(int index)
Returns the position in the sequence the parameter
index is
responsible for. |
double[][] |
getPWM()
If this
BayesianNetworkDiffSM is a PWM, i.e. |
int |
getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Returns the size of the event space of the random variables that are
affected by parameter no.
|
void |
initializeFunction(int index,
boolean freeParams,
DataSet[] data,
double[][] weights)
This method creates the underlying structure of the
DifferentiableSequenceScore. |
void |
initializeFunctionRandomly(boolean freeParams)
This method initializes the
DifferentiableSequenceScore randomly. |
boolean |
isInitialized()
This method can be used to determine whether the instance is initialized.
|
protected void |
precomputeNormalization()
Pre-computes all normalization constants.
|
void |
setParameters(double[] params,
int start)
This method sets the internal parameters to the values of
params between start and
start + |
protected void |
setPlugInParameters(int index,
boolean freeParameters,
DataSet[] data,
double[][] weights)
Computes and sets the plug-in parameters (MAP estimated parameters) from
data using weights. |
String |
toHtml(NumberFormat nf)
Returns an HTML representation of this
BayesianNetworkDiffSM. |
String |
toString(NumberFormat nf)
This method returns a
String representation of the instance. |
StringBuffer |
toXML()
This method returns an XML representation as
StringBuffer of an
instance of the implementing class. |
getInitialClassParam, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, isNormalized, isNormalizedgetAlphabetContainer, getCharacteristics, getLength, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation, getLogScoreFor, getLogScoreFor, getNumberOfRecommendedStarts, getNumberOfStarts, getNumericalCharacteristics, toStringequals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitgetAlphabetContainer, getCharacteristics, getLength, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getNumericalCharacteristicsgetLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation, getNumberOfRecommendedStartsprotected BNDiffSMParameter[] parameters
protected BNDiffSMParameterTree[] trees
protected boolean isTrained
protected double ess
protected Integer numFreePars
protected int[] nums
protected boolean plugInParameters
protected int[][] order
protected Double logNormalizationConstant
public BayesianNetworkDiffSM(AlphabetContainer alphabet, int length, double ess, boolean plugInParameters, Measure structureMeasure) throws Exception
BayesianNetworkDiffSM that has neither
been initialized nor trained.alphabet - the alphabet of the scoring function boxed in an
AlphabetContainer, e.g
new AlphabetContainer(new DNAAlphabet())length - the length of the scoring function, i.e. the length of the
sequences this scoring function can handleess - the equivalent sample sizeplugInParameters - indicates if plug-in parameters, i.e. generative (MAP)
parameters, shall be used upon initializationstructureMeasure - the Measure used for the structure, e.g.
InhomogeneousMarkovException - if the length of the scoring function is not admissible (<=0)
or the alphabet is not discretepublic BayesianNetworkDiffSM(BayesianNetworkDiffSMParameterSet parameters) throws ParameterSetParser.NotInstantiableException, Exception
BayesianNetworkDiffSM that has neither
been initialized nor trained from a
BayesianNetworkDiffSMParameterSet.parameters - the parameter setParameterSetParser.NotInstantiableException - if the BayesianNetworkDiffSM could not be
instantiated from the
BayesianNetworkDiffSMParameterSetException - if the length of the scoring function is not admissible (<=0)
or the alphabet is not discretepublic BayesianNetworkDiffSM(StringBuffer xml) throws NonParsableException
Storable.
Recreates a BayesianNetworkDiffSM from its XML
representation as saved by the method toXML().xml - the XML representation as StringBufferNonParsableException - if the XML code could not be parsedpublic BayesianNetworkDiffSM clone() throws CloneNotSupportedException
DifferentiableSequenceScoreDifferentiableSequenceScore
instance.clone in interface DifferentiableSequenceScoreclone in interface SequenceScoreclone in class AbstractDifferentiableStatisticalModelDifferentiableSequenceScoreCloneNotSupportedException - if something went wrong while cloning the
DifferentiableSequenceScorepublic double getLogPartialNormalizationConstant(int parameterIndex)
throws Exception
DifferentiableStatisticalModelparameterIndex. This is the logarithm of the partial derivation of the
normalization constant for the parameter with index
parameterIndex,
![\[\log \frac{\partial Z(\underline{\lambda})}{\partial \lambda_{parameterindex}}\]](images/DifferentiableStatisticalModel_LaTeXilb10_1.png)
getLogPartialNormalizationConstant in interface DifferentiableStatisticalModelparameterIndex - the index of the parameterException - if something went wrong with the normalizationDifferentiableStatisticalModel.getLogNormalizationConstant()public void initializeFunction(int index,
boolean freeParams,
DataSet[] data,
double[][] weights)
throws Exception
DifferentiableSequenceScoreDifferentiableSequenceScore.initializeFunction in interface DifferentiableSequenceScoreindex - the index of the class the DifferentiableSequenceScore modelsfreeParams - indicates whether the (reduced) parameterization is useddata - the data setsweights - the weights of the sequences in the data setsException - if something went wrongprotected void createTrees(DataSet[] data2, double[][] weights2) throws Exception
trees) and the parameter objects parameters using the
given Measure structureMeasure.data2 - the data that is used to compute the structureweights2 - the weights on the sequences in data2Exception - if the structure is no moral graph or if the lengths of data
and scoring function do not match or other problems
concerning the data occurprotected void setPlugInParameters(int index,
boolean freeParameters,
DataSet[] data,
double[][] weights)
data using weights.index - the index of the class the scoring function is responsible
for, the parameters are estimated from
data[index] and weights[index]freeParameters - indicates if only the free parameters or all parameters should
be used, this also affects the initializationdata - the data used for initializationweights - the weights on the dataprotected void fromXML(StringBuffer source) throws NonParsableException
AbstractDifferentiableSequenceScoreStorable
interface to create a scoring function from a StringBuffer.fromXML in class AbstractDifferentiableSequenceScoresource - the XML representation as StringBufferNonParsableException - if the StringBuffer could not be parsedAbstractDifferentiableSequenceScore.AbstractDifferentiableSequenceScore(StringBuffer)public String toString(NumberFormat nf)
SequenceScoreString representation of the instance.toString in interface SequenceScorenf - the NumberFormat for the String representation of parameters or probabilitiesString representation of the instancepublic String getInstanceName()
SequenceScoregetInstanceName in interface SequenceScorepublic double getLogScoreFor(Sequence seq, int start)
SequenceScoregetLogScoreFor in interface SequenceScoreseq - the Sequencestart - the start position in the SequenceSequencepublic boolean[][] getInfixFilter(int kmer,
double thresh,
int... start)
QuickScanningSequenceScorethreshold, choosing the best-scoring option among
all non-specified positions (i.e., those outside the k-mer).
This method is implemented as an upper bound on the scores, i.e., there may be k-mers that are considered
to score above threshold (i.e., that have entry true) although they are not, but there may not be k-mers that are considered to be below
threshold (i.e., that have entry false), although there exist sequences containing this k-mer that do.
The returned array is indexed by the starting positions (in the same order as provided in starts) in the first dimension, and in the second dimension
it is indexed by an integer representation of the k-mers, assigning the highest priority to the first k-mer position, i.e.,

denotes the size of the alphabet,
the length of the k-mer (starting at 0 in this case),
and
denotes the function encoding symbols from the alphabet as integers (see DiscreteAlphabet).getInfixFilter in interface QuickScanningSequenceScorekmer - the k-mer lengththresh - the thresholdstart - the starting position(s)public double getLogScoreAndPartialDerivation(Sequence seq, int start, IntList indices, DoubleList partialDer)
DifferentiableSequenceScoreSequence beginning at
position start in the Sequence and fills lists with
the indices and the partial derivations.getLogScoreAndPartialDerivation in interface DifferentiableSequenceScoreseq - the Sequencestart - the start position in the Sequenceindices - an IntList of indices, after method invocation the
list should contain the indices i where
is not zeropartialDer - a DoubleList of partial derivations, after method
invocation the list should contain the corresponding
that are not zeroSequencepublic double getLogNormalizationConstant()
throws RuntimeException
DifferentiableStatisticalModelgetLogNormalizationConstant in interface DifferentiableStatisticalModelRuntimeExceptionpublic int getNumberOfParameters()
DifferentiableSequenceScoreDifferentiableSequenceScore. If the
number of parameters is not known yet, the method returns
DifferentiableSequenceScore.UNKNOWN.getNumberOfParameters in interface DifferentiableSequenceScoreDifferentiableSequenceScoreDifferentiableSequenceScore.UNKNOWNpublic void setParameters(double[] params,
int start)
DifferentiableSequenceScoreparams between start and
start + DifferentiableSequenceScore.getNumberOfParameters() - 1setParameters in interface DifferentiableSequenceScoreparams - the new parametersstart - the start index in paramsprotected void precomputeNormalization()
public double[] getCurrentParameterValues()
throws Exception
DifferentiableSequenceScoredouble array of dimension
DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values.
If one likes to use these parameters to start an optimization it is
highly recommended to invoke
DifferentiableSequenceScore.initializeFunction(int, boolean, DataSet[], double[][]) before.
After an optimization this method can be used to get the current
parameter values.getCurrentParameterValues in interface DifferentiableSequenceScoreException - if no parameters exist (yet)public StringBuffer toXML()
StorableStringBuffer of an
instance of the implementing class.public double getLogPriorTerm()
DifferentiableStatisticalModel
DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior )
prior is the prior for the parameters of this model.getLogPriorTerm in interface DifferentiableStatisticalModelgetLogPriorTerm in interface StatisticalModelDifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior ).DifferentiableStatisticalModel.getESS(),
DifferentiableStatisticalModel.getLogNormalizationConstant()public void addGradientOfLogPriorTerm(double[] grad,
int start)
DifferentiableStatisticalModelDifferentiableStatisticalModel.getLogPriorTerm() for each
parameter of this model. The results are added to the array
grad beginning at index start.addGradientOfLogPriorTerm in interface DifferentiableStatisticalModelgrad - the array of gradientsstart - the start index in the grad array, where the
partial derivations for the parameters of this models shall be
enteredDifferentiableStatisticalModel.getLogPriorTerm()public double getESS()
DifferentiableStatisticalModelgetESS in interface DifferentiableStatisticalModelpublic int getPositionForParameter(int index)
index is
responsible for.index - the index of the parameterpublic double[] getPositionDependentKMerProb(Sequence kmer) throws Exception
kmer for all possible positions in this BayesianNetworkDiffSM starting at position kmer.getLength()-1.kmer - the k-merkmer.getLength()-1 to AbstractDifferentiableSequenceScore.getLength()-1Exception - if the method is called for non-Markov model structurespublic int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
DifferentiableStatisticalModelindex, i.e. the product of the
sizes of the alphabets at the position of each random variable affected
by parameter index. For DNA alphabets this corresponds to 4
for a PWM, 16 for a WAM except position 0, ...getSizeOfEventSpaceForRandomVariablesOfParameter in interface DifferentiableStatisticalModelindex - the index of the parameterpublic void initializeFunctionRandomly(boolean freeParams)
throws Exception
DifferentiableSequenceScoreDifferentiableSequenceScore randomly. It has to
create the underlying structure of the DifferentiableSequenceScore.initializeFunctionRandomly in interface DifferentiableSequenceScorefreeParams - indicates whether the (reduced) parameterization is usedException - if something went wrongpublic boolean isInitialized()
SequenceScoreSequenceScore.getLogScoreFor(Sequence).isInitialized in interface SequenceScoretrue if the instance is initialized, false
otherwisepublic double[][] getPWM()
throws Exception
BayesianNetworkDiffSM is a PWM, i.e.
structureMeasure=new InhomogeneousMarkov(0)}}, this
method returns the normalized PWM as a double array of
dimension AbstractDifferentiableSequenceScore.getLength() x size-of-alphabet.Exception - if this method is called for a
BayesianNetworkDiffSM that is not a PWMpublic byte getMaximalMarkovOrder()
throws UnsupportedOperationException
StatisticalModelgetMaximalMarkovOrder in interface StatisticalModelgetMaximalMarkovOrder in class AbstractDifferentiableStatisticalModelUnsupportedOperationException - if the model can't give a proper answerpublic double getMaximumScore()
throws Exception
BayesianNetworkDiffSM returned for a
admissible input sequence. Currently only implemented for position weight matrices.Exception - if the model is of higher order than 0public InstanceParameterSet getCurrentParameterSet() throws Exception
InstantiableFromParameterSetInstanceParameterSet that has been used to
instantiate the current instance of the implementing class. If the
current instance was not created using an InstanceParameterSet,
an equivalent InstanceParameterSet should be returned, so that an
instance created using this InstanceParameterSet would be in
principle equal to the current instance.getCurrentParameterSet in interface InstantiableFromParameterSetInstanceParameterSetException - if the InstanceParameterSet could not be returnedpublic DataSet emitDataSet(int numberOfSequences, int... seqLength) throws NotTrainedException, Exception
StatisticalModelDataSet object containing artificial
sequence(s).
emitDataSet( int n, int l ) should return a data set with
n sequences of length l.
emitDataSet( int n, int[] l ) should return a data set with
n sequences which have a sequence length corresponding to
the entry in the given array l.
emitDataSet( int n ) and
emitDataSet( int n, null ) should return a data set with
n sequences of length of the model (
SequenceScore.getLength()).
Exception.emitDataSet in interface StatisticalModelemitDataSet in class AbstractDifferentiableStatisticalModelnumberOfSequences - the number of sequences that should be contained in the
returned data setseqLength - the length of the sequences for a homogeneous model; for an
inhomogeneous model this parameter should be null
or an array of size 0.DataSet containing the artificial sequence(s)NotTrainedException - if the model is not trained yetException - if the emission did not succeedDataSetpublic String toHtml(NumberFormat nf)
BayesianNetworkDiffSM.nf - the number formatpublic void fillInfixScore(int[] seq,
int start,
int length,
double[] scores)
QuickScanningSequenceScoreseq (which must be encoded by the
same alphabet as this QuickScanningSequenceScore) beginning at start and extending for length
positions. The scores are computed per position and filled into the provided array scores, which is of the same
length as seq.
This must be implemented such that ToolBox.sum(double...) applied to scores
computed from 0 to seq.length returns the same value as SequenceScore.getLogScoreFor(de.jstacs.data.sequences.Sequence) called
on the IntSequence created from seq.fillInfixScore in interface QuickScanningSequenceScoreseq - the sequencestart - the start of the infixlength - the length of the infixscores - the array of scores to be (partly) filled