de.jstacs.scoringFunctions.directedGraphicalModels
Class BayesianNetworkScoringFunction

java.lang.Object
  extended by de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
      extended by de.jstacs.scoringFunctions.directedGraphicalModels.BayesianNetworkScoringFunction
All Implemented Interfaces:
NormalizableScoringFunction, ScoringFunction, Storable, Cloneable

public class BayesianNetworkScoringFunction
extends AbstractNormalizableScoringFunction

This class implements a scoring function that is a moral directed graphical model, i.e. a moral Bayesian network. This implementation also comprises well known specializations of Bayesian networks like Markov models of arbitrary order (including weight array matrix models and position weight matrices) or Bayesian trees. Different structures can be achieved by using the corresponding Measure, e.g. InhomogeneousMarkov for Markov models of arbitrary order.

This scoring function can be used in any ScoreClassifier, e.g. in a CLLClassifier to learn the parameters of the ScoringFunction using maximum conditional likelihood.

Author:
Jan Grau

Field Summary
protected  double ess
          The equivalent sample size
protected  boolean isTrained
          Indicates if the instance has been trained
protected  Double normalizationConstant
          Normalization constant to obtain normalized probabilities
protected  Integer numFreePars
          The number of free parameters.
protected  int[] nums
          Used, internally.
protected  int[][] order
          network structure, used internally
protected  Parameter[] parameters
          The parameters of the scoring function.
protected  boolean plugInParameters
          Indicates if plug-in parameters, i.e. generative (MAP) parameters shall be used upon initialization
protected  Measure structureMeasure
          Measure that defines the network structure
protected  ParameterTree[] trees
          The trees that represent the context of the random variable (i.e. configuration of parent random variables) of the parameters.
 
Fields inherited from class de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
alphabets, length, r
 
Fields inherited from interface de.jstacs.scoringFunctions.ScoringFunction
UNKNOWN
 
Constructor Summary
BayesianNetworkScoringFunction(AlphabetContainer alphabet, int length, double ess, boolean plugInParameters, Measure structureMeasure)
          Creates a new BayesianNetworkScoringFunction that has neither been initialized nor trained.
BayesianNetworkScoringFunction(StringBuffer xml)
          Re-creates a BayesianNetworkScoringFunction from its XML-representation, as saved by the toXML()} method.
 
Method Summary
 void addGradientOfLogPriorTerm(double[] grad, int start)
          This method computes the gradient of getLogPriorTerm() for each parameter of this model.
 BayesianNetworkScoringFunction clone()
          Creates a clone (deep copy) of the current ScoringFunction instance.
protected  void createTrees(Sample[] data2, double[][] weights2)
          Creates the tree structures that represent the context (array trees) and the parameter objects parameters using the given Measure structureMeasure.
protected  void fromXML(StringBuffer source)
          This method is called in the constructor to create a scoring function from a StringBuffer
 double[] getCurrentParameterValues()
          Returns a double array of dimension getNumberOfParameters() containing the current parameter values.
 double getEss()
          Returns the equivalent sample size of this model, i.e. the equivalent sample size for the class or component that is represented by this model.
 String getInstanceName()
          Returns a short instance name.
 double getLogPriorTerm()
          This method computes a value that is proportional to getESS()*Math.log( getNormalizationConstant() ) + Math.log( prior ).
 double getLogScore(Sequence seq, int start)
          Returns the log score for the sequence
 double getLogScoreAndPartialDerivation(Sequence seq, int start, IntList indices, DoubleList partialDer)
          Returns the log score for the sequence and fills the list with the indices and the partial derivations.
 double getNormalizationConstant()
          Returns the sum of the scores over all sequences of the event space.
 int getNumberOfParameters()
          The number of parameters in this scoring function.
 double getPartialNormalizationConstant(int parameterIndex)
          Returns the partial normalization constant for the parameter with index parameterIndex.
 int getPositionForParameter(int index)
          Returns the position in the sequence, the parameter index is responsible for.
 double[][] getPWM()
          If this BayesianNetworkScoringFunction is a PWM, i.e.
 int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
          Returns the size of the event space of the random variables that are affected by parameter no.
 void initializeFunction(int index, boolean freeParams, Sample[] data, double[][] weights)
          This method creates the underlying structure of the scoring function.
 void initializeFunctionRandomly(boolean freeParams)
          This method initializes the scoring function randomly.
 boolean isInitialized()
          This method can be used to determine whether the model is initialized.
protected  void precomputeNormalization()
          Precomputes all normalization constants and saves the global normalization constant to normalizationConstant.
 void setParameters(double[] params, int start)
          This method sets the internal parameters to the values of params between start and start + this.getNumberOfParameters() - 1
protected  void setPlugInParameters(int index, boolean freeParameters, Sample[] data, double[][] weights)
          Computes and sets the plug-in parameters (MAP estimated parameters) from data using weights.
 String toString()
           
 StringBuffer toXML()
          This method returns an XML-representation of an instance of the implementing class.
 
Methods inherited from class de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
getAlphabetContainer, getInitialClassParam, getLength, getLogScore, getLogScoreAndPartialDerivation, getNumberOfRecommendedStarts, isNormalized, isNormalized
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

parameters

protected Parameter[] parameters
The parameters of the scoring function. This comprises free as well as dependent parameters.


trees

protected ParameterTree[] trees
The trees that represent the context of the random variable (i.e. configuration of parent random variables) of the parameters.


isTrained

protected boolean isTrained
Indicates if the instance has been trained


ess

protected double ess
The equivalent sample size


numFreePars

protected Integer numFreePars
The number of free parameters.


nums

protected int[] nums
Used, internally. Mapping from indexes of free parameters to indexes of all parameters.


structureMeasure

protected Measure structureMeasure
Measure that defines the network structure


plugInParameters

protected boolean plugInParameters
Indicates if plug-in parameters, i.e. generative (MAP) parameters shall be used upon initialization


order

protected int[][] order
network structure, used internally


normalizationConstant

protected Double normalizationConstant
Normalization constant to obtain normalized probabilities

Constructor Detail

BayesianNetworkScoringFunction

public BayesianNetworkScoringFunction(AlphabetContainer alphabet,
                                      int length,
                                      double ess,
                                      boolean plugInParameters,
                                      Measure structureMeasure)
                               throws Exception
Creates a new BayesianNetworkScoringFunction that has neither been initialized nor trained.

Parameters:
alphabet - the alphabet of the scoring function boxed in an AlphabetContainer, e.g new AlphabetContainer(new DNAAlphabet())
length - the length of the scoring function, i.e. the length of the sequences this scoring function can handle
ess - the equivalent sample size
plugInParameters - indicates if plug-in parameters, i.e. generative (MAP) parameters shall be used upon initialzation
structureMeasure - the Measure used for the structure, e.g. InhomogeneousMarkov
Throws:
Exception - an exception is thrown if the length of the scoring function is not admissible (<=0), or the alphabet is not discrete

BayesianNetworkScoringFunction

public BayesianNetworkScoringFunction(StringBuffer xml)
                               throws NonParsableException
Re-creates a BayesianNetworkScoringFunction from its XML-representation, as saved by the toXML()} method.

Parameters:
xml - the XML-representation
Throws:
NonParsableException - if the XML-code could not be parsed an exception is thrown
Method Detail

clone

public BayesianNetworkScoringFunction clone()
                                     throws CloneNotSupportedException
Description copied from interface: ScoringFunction
Creates a clone (deep copy) of the current ScoringFunction instance.

Specified by:
clone in interface ScoringFunction
Overrides:
clone in class AbstractNormalizableScoringFunction
Returns:
the cloned instance
Throws:
CloneNotSupportedException

getPartialNormalizationConstant

public double getPartialNormalizationConstant(int parameterIndex)
                                       throws Exception
Description copied from interface: NormalizableScoringFunction
Returns the partial normalization constant for the parameter with index parameterIndex. This is the partial derivation of the normalization constant for the parameter with index parameterIndex \frac{\partial Z(\lambda)}{\partial \lambda_{index}}.

Parameters:
parameterIndex - the index of the parameter
Returns:
the partial normalization constant
Throws:
Exception - if something went wrong with the Normalization

initializeFunction

public void initializeFunction(int index,
                               boolean freeParams,
                               Sample[] data,
                               double[][] weights)
                        throws Exception
Description copied from interface: ScoringFunction
This method creates the underlying structure of the scoring function.

Parameters:
index - the index of the class the scoring function models
freeParams - if true, the (reduced) parameterization is used
data - the samples
weights - the weights of the sequences in the samples
Throws:
Exception

createTrees

protected void createTrees(Sample[] data2,
                           double[][] weights2)
                    throws Exception
Creates the tree structures that represent the context (array trees) and the parameter objects parameters using the given Measure structureMeasure.

Parameters:
data2 - the data that is used to compute the structure
weights2 - the weights on the sequences in data2
Throws:
Exception - throws an Exception if the structure is no moral graph or if the lengths of data and scoring function do not match or other problems concerning the data occur

setPlugInParameters

protected void setPlugInParameters(int index,
                                   boolean freeParameters,
                                   Sample[] data,
                                   double[][] weights)
Computes and sets the plug-in parameters (MAP estimated parameters) from data using weights.

Parameters:
index - the index of the class the scoring function is responsible for. The parameters are estimated from data[index] and weights[index].
freeParameters - indicates if only the free parameters or all parameters should be used. This also affects the initialization.
data - the data used for initialization
weights - the weights on the data

fromXML

protected void fromXML(StringBuffer source)
                throws NonParsableException
Description copied from class: AbstractNormalizableScoringFunction
This method is called in the constructor to create a scoring function from a StringBuffer

Specified by:
fromXML in class AbstractNormalizableScoringFunction
Parameters:
source - the XML representation
Throws:
NonParsableException - if the StringBuffer could not be parsed.

toString

public String toString()
Overrides:
toString in class Object

getInstanceName

public String getInstanceName()
Description copied from interface: ScoringFunction
Returns a short instance name.

Returns:
a short instance name

getLogScore

public double getLogScore(Sequence seq,
                          int start)
Description copied from interface: ScoringFunction
Returns the log score for the sequence

Parameters:
seq - the sequence
start - the startposition in the sequence
Returns:
the log score for the sequence

getLogScoreAndPartialDerivation

public double getLogScoreAndPartialDerivation(Sequence seq,
                                              int start,
                                              IntList indices,
                                              DoubleList partialDer)
Description copied from interface: ScoringFunction
Returns the log score for the sequence and fills the list with the indices and the partial derivations.

Parameters:
seq - the sequence
start - the startposition in the sequence
indices - after method invocation the list should contain the indices i where \frac{\partial \log score(seq)}{\partial \lambda_i} is not zero
partialDer - after method invocation the list should contain the corresponding \frac{\partial \log score(seq)}{\partial \lambda_i}
Returns:
the log score

getNormalizationConstant

public double getNormalizationConstant()
                                throws RuntimeException
Description copied from interface: NormalizableScoringFunction
Returns the sum of the scores over all sequences of the event space.

Returns:
the normalization constant Z
Throws:
RuntimeException

getNumberOfParameters

public int getNumberOfParameters()
Description copied from interface: ScoringFunction
The number of parameters in this scoring function. If the number of parameters is not known yet, the method returns UNKNOWN.

Returns:
the number of parameters in this scoring function
See Also:
ScoringFunction.UNKNOWN

setParameters

public void setParameters(double[] params,
                          int start)
Description copied from interface: ScoringFunction
This method sets the internal parameters to the values of params between start and start + this.getNumberOfParameters() - 1

Parameters:
params - the parameters
start - the start index

precomputeNormalization

protected void precomputeNormalization()
Precomputes all normalization constants and saves the global normalization constant to normalizationConstant.

Throws:
RuntimeException

getCurrentParameterValues

public double[] getCurrentParameterValues()
                                   throws Exception
Description copied from interface: ScoringFunction
Returns a double array of dimension getNumberOfParameters() containing the current parameter values. If on e likes to use these parameters to start an optimization it is highly recommended to invoke ScoringFunction.initializeFunction(int, boolean, Sample[], double[][]) before. After an optimization this method can be used to get the current parameter values.

Returns:
the current parameter values
Throws:
Exception - is thrown if no parameters exist, yet

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML-representation of an instance of the implementing class.

Returns:
the XML-representation

getLogPriorTerm

public double getLogPriorTerm()
Description copied from interface: NormalizableScoringFunction
This method computes a value that is proportional to

getESS()*Math.log( getNormalizationConstant() ) + Math.log( prior ).

where prior is the prior for the parameters of this model.

Returns:
getESS()*Math.log( getNormalizationConstant() ) + Math.log( prior )
See Also:
NormalizableScoringFunction.getEss(), NormalizableScoringFunction.getNormalizationConstant()

addGradientOfLogPriorTerm

public void addGradientOfLogPriorTerm(double[] grad,
                                      int start)
Description copied from interface: NormalizableScoringFunction
This method computes the gradient of getLogPriorTerm() for each parameter of this model. The results are added to the array grad beginning at index start.

Parameters:
grad - the gradient
start - the start index in the grad array, where the partial derivations for the parameters of this models shall be enter
See Also:
NormalizableScoringFunction.getLogPriorTerm()

getEss

public double getEss()
Description copied from interface: NormalizableScoringFunction
Returns the equivalent sample size of this model, i.e. the equivalent sample size for the class or component that is represented by this model.

Returns:
the equivalent sample size.

getPositionForParameter

public int getPositionForParameter(int index)
Returns the position in the sequence, the parameter index is responsible for.

Parameters:
index - the index of the parameter
Returns:
the position in the sequence

getSizeOfEventSpaceForRandomVariablesOfParameter

public int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Description copied from interface: NormalizableScoringFunction
Returns the size of the event space of the random variables that are affected by parameter no. index, i.e. the product of the sizes of the alphabets at the position of each random variable affected by parameter index. For DNA-alphabets this corresponds to 4 for a PWM, 16 for a WAM except position 0, ...

Parameters:
index - the index of the parameter
Returns:
the size of the event space

initializeFunctionRandomly

public void initializeFunctionRandomly(boolean freeParams)
                                throws Exception
Description copied from interface: ScoringFunction
This method initializes the scoring function randomly. It has to create the underlying structure of the scoring function.

Parameters:
freeParams - if true, the (reduced) parameterization is used
Throws:
Exception

isInitialized

public boolean isInitialized()
Description copied from interface: ScoringFunction
This method can be used to determine whether the model is initialized. If the model is not initialize you should invoke the method ScoringFunction.initializeFunction(int, boolean, Sample[], double[][]).

Returns:
true if the model is initialized

getPWM

public double[][] getPWM()
                  throws Exception
If this BayesianNetworkScoringFunction is a PWM, i.e. structureMeasure=new InhomogeneousMarkov(0)}}, this method returns the normalized PWM as a double array of dimension AbstractNormalizableScoringFunction.getLength() x size-of-alphabet

Returns:
the PWM
Throws:
Exception - throws an Exception if this method is called for a BayesianNetworkScoringFunction} that is not a PWM