de.jstacs.sequenceScores.statisticalModels.differentiable.directedGraphicalModels
Class BayesianNetworkDiffSM

java.lang.Object
  extended by de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
      extended by de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractDifferentiableStatisticalModel
          extended by de.jstacs.sequenceScores.statisticalModels.differentiable.directedGraphicalModels.BayesianNetworkDiffSM
All Implemented Interfaces:
InstantiableFromParameterSet, DifferentiableSequenceScore, SequenceScore, DifferentiableStatisticalModel, StatisticalModel, Storable, Cloneable
Direct Known Subclasses:
MarkovModelDiffSM

public class BayesianNetworkDiffSM
extends AbstractDifferentiableStatisticalModel
implements InstantiableFromParameterSet

This class implements a scoring function that is a moral directed graphical model, i.e. a moral Bayesian network. This implementation also comprises well known specializations of Bayesian networks like Markov models of arbitrary order (including weight array matrix models (WAM) and position weight matrices (PWM)) or Bayesian trees. Different structures can be achieved by using the corresponding Measure, e.g. InhomogeneousMarkov for Markov models of arbitrary order.

This scoring function can be used in any ScoreClassifier, e.g. in a MSPClassifier to learn the parameters of the DifferentiableStatisticalModel using maximum conditional likelihood or maximum supervised posterior.

Author:
Jan Grau

Field Summary
protected  double ess
          The equivalent sample size.
protected  boolean isTrained
          Indicates if the instance has been trained.
protected  Double logNormalizationConstant
          Normalization constant to obtain normalized probabilities.
protected  Integer numFreePars
          The number of free parameters.
protected  int[] nums
          Used internally.
protected  int[][] order
          The network structure, used internally.
protected  BNDiffSMParameter[] parameters
          The parameters of the scoring function.
protected  boolean plugInParameters
          Indicates if plug-in parameters, i.e.
protected  Measure structureMeasure
          Measure that defines the network structure.
protected  BNDiffSMParameterTree[] trees
          The trees that represent the context of the random variable (i.e.
 
Fields inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
alphabets, length, r
 
Fields inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
UNKNOWN
 
Constructor Summary
BayesianNetworkDiffSM(AlphabetContainer alphabet, int length, double ess, boolean plugInParameters, Measure structureMeasure)
          Creates a new BayesianNetworkDiffSM that has neither been initialized nor trained.
BayesianNetworkDiffSM(BayesianNetworkDiffSMParameterSet parameters)
          Creates a new BayesianNetworkDiffSM that has neither been initialized nor trained from a BayesianNetworkDiffSMParameterSet.
BayesianNetworkDiffSM(StringBuffer xml)
          The standard constructor for the interface Storable.
 
Method Summary
 void addGradientOfLogPriorTerm(double[] grad, int start)
          This method computes the gradient of DifferentiableStatisticalModel.getLogPriorTerm() for each parameter of this model.
 BayesianNetworkDiffSM clone()
          Creates a clone (deep copy) of the current DifferentiableSequenceScore instance.
protected  void createTrees(DataSet[] data2, double[][] weights2)
          Creates the tree structures that represent the context (array trees) and the parameter objects parameters using the given Measure structureMeasure.
 DataSet emitDataSet(int numberOfSequences, int... seqLength)
          This method returns a DataSet object containing artificial sequence(s).
protected  void fromXML(StringBuffer source)
          This method is called in the constructor for the Storable interface to create a scoring function from a StringBuffer.
 InstanceParameterSet getCurrentParameterSet()
          Returns the InstanceParameterSet that has been used to instantiate the current instance of the implementing class.
 double[] getCurrentParameterValues()
          Returns a double array of dimension DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values.
 double getESS()
          Returns the equivalent sample size (ess) of this model, i.e.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 double getLogNormalizationConstant()
          Returns the logarithm of the sum of the scores over all sequences of the event space.
 double getLogPartialNormalizationConstant(int parameterIndex)
          Returns the logarithm of the partial normalization constant for the parameter with index parameterIndex.
 double getLogPriorTerm()
          This method computes a value that is proportional to
 double getLogScoreAndPartialDerivation(Sequence seq, int start, IntList indices, DoubleList partialDer)
          Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.
 double getLogScoreFor(Sequence seq, int start)
          Returns the logarithmic score for the Sequence seq beginning at position start in the Sequence.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 double getMaximumScore()
          Returns the maximum score of this BayesianNetworkDiffSM returned for a admissible input sequence.
 int getNumberOfParameters()
          Returns the number of parameters in this DifferentiableSequenceScore.
 double[] getPositionDependentKMerProb(Sequence kmer)
          Returns the probability of kmer for all possible positions in this BayesianNetworkDiffSM starting at position kmer.getLength()-1.
 int getPositionForParameter(int index)
          Returns the position in the sequence the parameter index is responsible for.
 double[][] getPWM()
          If this BayesianNetworkDiffSM is a PWM, i.e.
 int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
          Returns the size of the event space of the random variables that are affected by parameter no.
 void initializeFunction(int index, boolean freeParams, DataSet[] data, double[][] weights)
          This method creates the underlying structure of the DifferentiableSequenceScore.
 void initializeFunctionRandomly(boolean freeParams)
          This method initializes the DifferentiableSequenceScore randomly.
 boolean isInitialized()
          This method can be used to determine whether the instance is initialized.
protected  void precomputeNormalization()
          Pre-computes all normalization constants.
 void setParameters(double[] params, int start)
          This method sets the internal parameters to the values of params between start and start + DifferentiableSequenceScore.getNumberOfParameters() - 1
protected  void setPlugInParameters(int index, boolean freeParameters, DataSet[] data, double[][] weights)
          Computes and sets the plug-in parameters (MAP estimated parameters) from data using weights.
 String toHtml(NumberFormat nf)
          Returns an HTML representation of this BayesianNetworkDiffSM.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractDifferentiableStatisticalModel
getInitialClassParam, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, isNormalized, isNormalized
 
Methods inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
getAlphabetContainer, getCharacteristics, getLength, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation, getLogScoreFor, getLogScoreFor, getNumberOfRecommendedStarts, getNumberOfStarts, getNumericalCharacteristics
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation, getNumberOfRecommendedStarts
 
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore
getAlphabetContainer, getCharacteristics, getLength, getLogScoreFor, getLogScoreFor, getNumericalCharacteristics
 

Field Detail

parameters

protected BNDiffSMParameter[] parameters
The parameters of the scoring function. This comprises free as well as dependent parameters.


trees

protected BNDiffSMParameterTree[] trees
The trees that represent the context of the random variable (i.e. configuration of parent random variables) of the parameters.


isTrained

protected boolean isTrained
Indicates if the instance has been trained.


ess

protected double ess
The equivalent sample size.


numFreePars

protected Integer numFreePars
The number of free parameters.


nums

protected int[] nums
Used internally. Mapping from indexes of free parameters to indexes of all parameters.


structureMeasure

protected Measure structureMeasure
Measure that defines the network structure.


plugInParameters

protected boolean plugInParameters
Indicates if plug-in parameters, i.e. generative (MAP) parameters shall be used upon initialization.


order

protected int[][] order
The network structure, used internally.


logNormalizationConstant

protected Double logNormalizationConstant
Normalization constant to obtain normalized probabilities.

Constructor Detail

BayesianNetworkDiffSM

public BayesianNetworkDiffSM(AlphabetContainer alphabet,
                             int length,
                             double ess,
                             boolean plugInParameters,
                             Measure structureMeasure)
                      throws Exception
Creates a new BayesianNetworkDiffSM that has neither been initialized nor trained.

Parameters:
alphabet - the alphabet of the scoring function boxed in an AlphabetContainer, e.g new AlphabetContainer(new DNAAlphabet())
length - the length of the scoring function, i.e. the length of the sequences this scoring function can handle
ess - the equivalent sample size
plugInParameters - indicates if plug-in parameters, i.e. generative (MAP) parameters, shall be used upon initialization
structureMeasure - the Measure used for the structure, e.g. InhomogeneousMarkov
Throws:
Exception - if the length of the scoring function is not admissible (<=0) or the alphabet is not discrete

BayesianNetworkDiffSM

public BayesianNetworkDiffSM(BayesianNetworkDiffSMParameterSet parameters)
                      throws ParameterSetParser.NotInstantiableException,
                             Exception
Creates a new BayesianNetworkDiffSM that has neither been initialized nor trained from a BayesianNetworkDiffSMParameterSet.

Parameters:
parameters - the parameter set
Throws:
ParameterSetParser.NotInstantiableException - if the BayesianNetworkDiffSM could not be instantiated from the BayesianNetworkDiffSMParameterSet
Exception - if the length of the scoring function is not admissible (<=0) or the alphabet is not discrete

BayesianNetworkDiffSM

public BayesianNetworkDiffSM(StringBuffer xml)
                      throws NonParsableException
The standard constructor for the interface Storable. Recreates a BayesianNetworkDiffSM from its XML representation as saved by the method toXML().

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the XML code could not be parsed
Method Detail

clone

public BayesianNetworkDiffSM clone()
                            throws CloneNotSupportedException
Description copied from interface: DifferentiableSequenceScore
Creates a clone (deep copy) of the current DifferentiableSequenceScore instance.

Specified by:
clone in interface DifferentiableSequenceScore
Specified by:
clone in interface SequenceScore
Overrides:
clone in class AbstractDifferentiableStatisticalModel
Returns:
the cloned instance of the current DifferentiableSequenceScore
Throws:
CloneNotSupportedException - if something went wrong while cloning the DifferentiableSequenceScore

getLogPartialNormalizationConstant

public double getLogPartialNormalizationConstant(int parameterIndex)
                                          throws Exception
Description copied from interface: DifferentiableStatisticalModel
Returns the logarithm of the partial normalization constant for the parameter with index parameterIndex. This is the logarithm of the partial derivation of the normalization constant for the parameter with index parameterIndex,
\[\log \frac{\partial Z(\underline{\lambda})}{\partial \lambda_{parameterindex}}\]
.

Specified by:
getLogPartialNormalizationConstant in interface DifferentiableStatisticalModel
Parameters:
parameterIndex - the index of the parameter
Returns:
the logarithm of the partial normalization constant
Throws:
Exception - if something went wrong with the normalization
See Also:
DifferentiableStatisticalModel.getLogNormalizationConstant()

initializeFunction

public void initializeFunction(int index,
                               boolean freeParams,
                               DataSet[] data,
                               double[][] weights)
                        throws Exception
Description copied from interface: DifferentiableSequenceScore
This method creates the underlying structure of the DifferentiableSequenceScore.

Specified by:
initializeFunction in interface DifferentiableSequenceScore
Parameters:
index - the index of the class the DifferentiableSequenceScore models
freeParams - indicates whether the (reduced) parameterization is used
data - the data sets
weights - the weights of the sequences in the data sets
Throws:
Exception - if something went wrong

createTrees

protected void createTrees(DataSet[] data2,
                           double[][] weights2)
                    throws Exception
Creates the tree structures that represent the context (array trees) and the parameter objects parameters using the given Measure structureMeasure.

Parameters:
data2 - the data that is used to compute the structure
weights2 - the weights on the sequences in data2
Throws:
Exception - if the structure is no moral graph or if the lengths of data and scoring function do not match or other problems concerning the data occur

setPlugInParameters

protected void setPlugInParameters(int index,
                                   boolean freeParameters,
                                   DataSet[] data,
                                   double[][] weights)
Computes and sets the plug-in parameters (MAP estimated parameters) from data using weights.

Parameters:
index - the index of the class the scoring function is responsible for, the parameters are estimated from data[index] and weights[index]
freeParameters - indicates if only the free parameters or all parameters should be used, this also affects the initialization
data - the data used for initialization
weights - the weights on the data

fromXML

protected void fromXML(StringBuffer source)
                throws NonParsableException
Description copied from class: AbstractDifferentiableSequenceScore
This method is called in the constructor for the Storable interface to create a scoring function from a StringBuffer.

Specified by:
fromXML in class AbstractDifferentiableSequenceScore
Parameters:
source - the XML representation as StringBuffer
Throws:
NonParsableException - if the StringBuffer could not be parsed
See Also:
AbstractDifferentiableSequenceScore.AbstractDifferentiableSequenceScore(StringBuffer)

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Specified by:
toString in interface SequenceScore
Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Specified by:
getInstanceName in interface SequenceScore
Returns:
a short instance name

getLogScoreFor

public double getLogScoreFor(Sequence seq,
                             int start)
Description copied from interface: SequenceScore
Returns the logarithmic score for the Sequence seq beginning at position start in the Sequence.

Specified by:
getLogScoreFor in interface SequenceScore
Parameters:
seq - the Sequence
start - the start position in the Sequence
Returns:
the logarithmic score for the Sequence

getLogScoreAndPartialDerivation

public double getLogScoreAndPartialDerivation(Sequence seq,
                                              int start,
                                              IntList indices,
                                              DoubleList partialDer)
Description copied from interface: DifferentiableSequenceScore
Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.

Specified by:
getLogScoreAndPartialDerivation in interface DifferentiableSequenceScore
Parameters:
seq - the Sequence
start - the start position in the Sequence
indices - an IntList of indices, after method invocation the list should contain the indices i where $\frac{\partial \log score(seq)}{\partial \lambda_i}$ is not zero
partialDer - a DoubleList of partial derivations, after method invocation the list should contain the corresponding $\frac{\partial \log score(seq)}{\partial \lambda_i}$ that are not zero
Returns:
the logarithmic score for the Sequence

getLogNormalizationConstant

public double getLogNormalizationConstant()
                                   throws RuntimeException
Description copied from interface: DifferentiableStatisticalModel
Returns the logarithm of the sum of the scores over all sequences of the event space.

Specified by:
getLogNormalizationConstant in interface DifferentiableStatisticalModel
Returns:
the logarithm of the normalization constant Z
Throws:
RuntimeException

getNumberOfParameters

public int getNumberOfParameters()
Description copied from interface: DifferentiableSequenceScore
Returns the number of parameters in this DifferentiableSequenceScore. If the number of parameters is not known yet, the method returns DifferentiableSequenceScore.UNKNOWN.

Specified by:
getNumberOfParameters in interface DifferentiableSequenceScore
Returns:
the number of parameters in this DifferentiableSequenceScore
See Also:
DifferentiableSequenceScore.UNKNOWN

setParameters

public void setParameters(double[] params,
                          int start)
Description copied from interface: DifferentiableSequenceScore
This method sets the internal parameters to the values of params between start and start + DifferentiableSequenceScore.getNumberOfParameters() - 1

Specified by:
setParameters in interface DifferentiableSequenceScore
Parameters:
params - the new parameters
start - the start index in params

precomputeNormalization

protected void precomputeNormalization()
Pre-computes all normalization constants.


getCurrentParameterValues

public double[] getCurrentParameterValues()
                                   throws Exception
Description copied from interface: DifferentiableSequenceScore
Returns a double array of dimension DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values. If one likes to use these parameters to start an optimization it is highly recommended to invoke DifferentiableSequenceScore.initializeFunction(int, boolean, DataSet[], double[][]) before. After an optimization this method can be used to get the current parameter values.

Specified by:
getCurrentParameterValues in interface DifferentiableSequenceScore
Returns:
the current parameter values
Throws:
Exception - if no parameters exist (yet)

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Specified by:
toXML in interface Storable
Returns:
the XML representation

getLogPriorTerm

public double getLogPriorTerm()
Description copied from interface: DifferentiableStatisticalModel
This method computes a value that is proportional to

DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior )

where prior is the prior for the parameters of this model.

Specified by:
getLogPriorTerm in interface DifferentiableStatisticalModel
Specified by:
getLogPriorTerm in interface StatisticalModel
Returns:
a value that is proportional to DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior ).
See Also:
DifferentiableStatisticalModel.getESS(), DifferentiableStatisticalModel.getLogNormalizationConstant()

addGradientOfLogPriorTerm

public void addGradientOfLogPriorTerm(double[] grad,
                                      int start)
Description copied from interface: DifferentiableStatisticalModel
This method computes the gradient of DifferentiableStatisticalModel.getLogPriorTerm() for each parameter of this model. The results are added to the array grad beginning at index start.

Specified by:
addGradientOfLogPriorTerm in interface DifferentiableStatisticalModel
Parameters:
grad - the array of gradients
start - the start index in the grad array, where the partial derivations for the parameters of this models shall be entered
See Also:
DifferentiableStatisticalModel.getLogPriorTerm()

getESS

public double getESS()
Description copied from interface: DifferentiableStatisticalModel
Returns the equivalent sample size (ess) of this model, i.e. the equivalent sample size for the class or component that is represented by this model.

Specified by:
getESS in interface DifferentiableStatisticalModel
Returns:
the equivalent sample size.

getPositionForParameter

public int getPositionForParameter(int index)
Returns the position in the sequence the parameter index is responsible for.

Parameters:
index - the index of the parameter
Returns:
the position in the sequence

getPositionDependentKMerProb

public double[] getPositionDependentKMerProb(Sequence kmer)
                                      throws Exception
Returns the probability of kmer for all possible positions in this BayesianNetworkDiffSM starting at position kmer.getLength()-1.

Parameters:
kmer - the k-mer
Returns:
the position-dependent probabilities of this k-mer for position kmer.getLength()-1 to AbstractDifferentiableSequenceScore.getLength()-1
Throws:
Exception - if the method is called for non-Markov model structures

getSizeOfEventSpaceForRandomVariablesOfParameter

public int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Description copied from interface: DifferentiableStatisticalModel
Returns the size of the event space of the random variables that are affected by parameter no. index, i.e. the product of the sizes of the alphabets at the position of each random variable affected by parameter index. For DNA alphabets this corresponds to 4 for a PWM, 16 for a WAM except position 0, ...

Specified by:
getSizeOfEventSpaceForRandomVariablesOfParameter in interface DifferentiableStatisticalModel
Parameters:
index - the index of the parameter
Returns:
the size of the event space

initializeFunctionRandomly

public void initializeFunctionRandomly(boolean freeParams)
                                throws Exception
Description copied from interface: DifferentiableSequenceScore
This method initializes the DifferentiableSequenceScore randomly. It has to create the underlying structure of the DifferentiableSequenceScore.

Specified by:
initializeFunctionRandomly in interface DifferentiableSequenceScore
Parameters:
freeParams - indicates whether the (reduced) parameterization is used
Throws:
Exception - if something went wrong

isInitialized

public boolean isInitialized()
Description copied from interface: SequenceScore
This method can be used to determine whether the instance is initialized. If the instance is initialized you should be able to invoke SequenceScore.getLogScoreFor(Sequence).

Specified by:
isInitialized in interface SequenceScore
Returns:
true if the instance is initialized, false otherwise

getPWM

public double[][] getPWM()
                  throws Exception
If this BayesianNetworkDiffSM is a PWM, i.e. structureMeasure=new InhomogeneousMarkov(0)}}, this method returns the normalized PWM as a double array of dimension AbstractDifferentiableSequenceScore.getLength() x size-of-alphabet.

Returns:
the PWM as a two-dimensional array
Throws:
Exception - if this method is called for a BayesianNetworkDiffSM that is not a PWM

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
Description copied from interface: StatisticalModel
This method returns the maximal used Markov order, if possible.

Specified by:
getMaximalMarkovOrder in interface StatisticalModel
Overrides:
getMaximalMarkovOrder in class AbstractDifferentiableStatisticalModel
Returns:
maximal used Markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer

getMaximumScore

public double getMaximumScore()
                       throws Exception
Returns the maximum score of this BayesianNetworkDiffSM returned for a admissible input sequence. Currently only implemented for position weight matrices.

Returns:
the maximum score
Throws:
Exception - if the model is of higher order than 0

getCurrentParameterSet

public InstanceParameterSet getCurrentParameterSet()
                                            throws Exception
Description copied from interface: InstantiableFromParameterSet
Returns the InstanceParameterSet that has been used to instantiate the current instance of the implementing class. If the current instance was not created using an InstanceParameterSet, an equivalent InstanceParameterSet should be returned, so that an instance created using this InstanceParameterSet would be in principle equal to the current instance.

Specified by:
getCurrentParameterSet in interface InstantiableFromParameterSet
Returns:
the current InstanceParameterSet
Throws:
Exception - if the InstanceParameterSet could not be returned

emitDataSet

public DataSet emitDataSet(int numberOfSequences,
                           int... seqLength)
                    throws NotTrainedException,
                           Exception
Description copied from interface: StatisticalModel
This method returns a DataSet object containing artificial sequence(s).

There are two different possibilities to create a data set for a model with length 0 (homogeneous models).
  1. emitDataSet( int n, int l ) should return a data set with n sequences of length l.
  2. emitDataSet( int n, int[] l ) should return a data set with n sequences which have a sequence length corresponding to the entry in the given array l.

There are two different possibilities to create a data set for a model with length greater than 0 (inhomogeneous models).
emitDataSet( int n ) and emitDataSet( int n, null ) should return a data set with n sequences of length of the model ( SequenceScore.getLength()).

The standard implementation throws an Exception.

Specified by:
emitDataSet in interface StatisticalModel
Overrides:
emitDataSet in class AbstractDifferentiableStatisticalModel
Parameters:
numberOfSequences - the number of sequences that should be contained in the returned data set
seqLength - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
a DataSet containing the artificial sequence(s)
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the emission did not succeed
See Also:
DataSet

toHtml

public String toHtml(NumberFormat nf)
Returns an HTML representation of this BayesianNetworkDiffSM.

Parameters:
nf - the number format
Returns:
the HTML representation