de.jstacs.sequenceScores.statisticalModels.differentiable.directedGraphicalModels
Class BNDiffSMParameterTree

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.differentiable.directedGraphicalModels.BNDiffSMParameterTree
All Implemented Interfaces:
Storable, Cloneable

public class BNDiffSMParameterTree
extends Object
implements Cloneable, Storable

Class for the tree that represents the context of a BNDiffSMParameter in a BayesianNetworkDiffSM.

Author:
Jan Grau

Nested Class Summary
 class BNDiffSMParameterTree.TreeElement
          Class for the nodes of a BNDiffSMParameterTree
 
Constructor Summary
BNDiffSMParameterTree(int pos, int[] contextPoss, AlphabetContainer alphabet, int firstParent, int[] firstChildren)
          Creates a new BNDiffSMParameterTree for the parameters at position pos using the parent positions in contextPoss.
BNDiffSMParameterTree(StringBuffer source)
          Recreates a BNDiffSMParameterTree from its XML representation as returned by toXML().
 
Method Summary
 void addCount(Sequence seq, int start, double count)
          Adds count to the parameter as returned by getParameterFor(Sequence, int).
 void backward(BNDiffSMParameterTree[] trees, int[][] order)
          Computes the backward-part of the normalization constant starting from this BNDiffSMParameterTree.
 BNDiffSMParameterTree clone()
           
 Double computeGammaNorm()
          Computes the Gamma-normalization for the prior.
 void copy(BNDiffSMParameterTree parameterTree)
          Copies the values of the parameters from another BNDiffSMParameterTree.
 void divideByUnfree()
          Divides each of the normalized parameters on a simplex by the last BNDiffSMParameter, which is defined not to be free.
 void drawKLDivergences(double[] kls, double[] weights, double[][][][] contrast, double samples)
          Draws KL-divergences between the distributions given by contrast[i] each weighted by weights[i] kls.length distributions drawn from a Dirichlet density centered around contrast, i.e.
 void drawKLDivergences(double weight, double[] kls, int startIdx, int endIdx, double[][][] contrast, double samples)
          Draws KL-divergences between the distribution given by contrast and endIdx-startIdx distributions drawn from a Dirichlet density centered around contrast, i.e.
 void fill(double[] weight, double[][][][] distribution)
          Fills all parameters with the probabilities given in distribution.
 double forward(BNDiffSMParameterTree[] trees)
          Computes the forward-part of the normalization constant starting from this BNDiffSMParameterTree.
 int getFirstParent()
          Returns the first parent of the random variable of this BNDiffSMParameterTree in the topological ordering of the network structure of the enclosing BayesianNetworkDiffSM.
 double getKLDivergence(double[][][] ds)
          Returns the KL-divergence of the distribution of this BNDiffSMParameterTree and the distribution given by ds.
 double getKLDivergence(double[] weight, double[][][][] distribution)
          Returns the KL-divergence of the distribution of this BNDiffSMParameterTree and a number of distribution given by ds and weighted by weight
 byte getMaximalMarkovOrder()
          Returns the maximal Markov order of this tree.
 double getMaximumScore()
          Returns the maximum score in this tree.
 int getNumberOfParents()
          Returns the number of parents for the random variable of this BNDiffSMParameterTree in the network structure of the enclosing BayesianNetworkDiffSM.
 BNDiffSMParameter getParameterFor(Sequence seq, int start)
          Returns the BNDiffSMParameter that is responsible for the suffix of sequence seq starting at position start.
 int[] getParameterIndexesForSamplingStep(int step, int offset)
          Returns the indexes of the parameters, incremented by offset, that shall be sampled in step step of a grouped sampling process.
 double getProbFor(Sequence sequence)
          Returns the probability of Sequence sequence in this BNDiffSMParameterTree.
 void initializeRandomly(double ess)
          Initializes the parameters of this BNDiffSMParameterTree randomly.
 void insertProbs(double[] probs)
          Computes the probabilities for a PWM, i.e.
 void invalidateNormalizers()
          Resets all pre-computed normalization constants.
 boolean isLeaf()
          Indicates if the random variable of this BNDiffSMParameterTree is a leaf, i.e.
 LinkedList<BNDiffSMParameter> linearizeParameters()
          Extracts the BNDiffSMParameters from the leaves of this tree in left-to-right order (as specified by the order of the alphabet) and returns them as a LinkedList.
 void normalizeParameters()
          Normalizes the parameter values to the corresponding log-probabilities.
 void normalizePlugInParameters()
          Starts the normalization of the plug-in parameters to the logarithm of the MAP-estimates.
 void print()
          Prints the structure of this tree.
 void setParameterFor(int symbol, int[][] context, BNDiffSMParameter par)
          Sets the instance of the BNDiffSMParameter for symbol symbol and context context to BNDiffSMParameter par.
 String toHtml(NumberFormat nf)
          Returns an HTML representation of this tree.
 String toString(NumberFormat nf)
          Returns a string representation of this tree using the provided NumberFormat.
 StringBuffer toXML()
          Works as defined in Storable.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BNDiffSMParameterTree

public BNDiffSMParameterTree(int pos,
                             int[] contextPoss,
                             AlphabetContainer alphabet,
                             int firstParent,
                             int[] firstChildren)
Creates a new BNDiffSMParameterTree for the parameters at position pos using the parent positions in contextPoss. These are used to extract the correct alphabet out of alphabet for every context position. The first parent is the first parent of the random variable at pos as given by the topological ordering of the network structure of the enclosing BayesianNetworkDiffSM. The first children are the children the random variable at pos is the first parent for.

Parameters:
pos - the position of the random variable of the parameters in the tree
contextPoss - the positions of the context
alphabet - the alphabet of the enclosing BayesianNetworkDiffSM
firstParent - the first parent of this random variable, or -1 if the random variable has no parent
firstChildren - the first children of this random variable

BNDiffSMParameterTree

public BNDiffSMParameterTree(StringBuffer source)
                      throws NonParsableException
Recreates a BNDiffSMParameterTree from its XML representation as returned by toXML(). BNDiffSMParameterTree does not implement the Storable interface to recycle the AlphabetContainer of the enclosing BayesianNetworkDiffSM, but besides the different constructor works like any implementation of Storable.

Parameters:
source - the XML representation as StringBuffer
Throws:
NonParsableException - if the XML code could not be parsed
Method Detail

clone

public BNDiffSMParameterTree clone()
                            throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException

toString

public String toString(NumberFormat nf)
Returns a string representation of this tree using the provided NumberFormat.

Parameters:
nf - the number format
Returns:
the string representation

insertProbs

public void insertProbs(double[] probs)
                 throws Exception
Computes the probabilities for a PWM, i.e. the parameters in the tree have an empty context, and inserts them into probs. Used by BayesianNetworkDiffSM.getPWM().

Parameters:
probs - the array to store the probabilities for a PWM
Throws:
Exception - if the tree structure does have a non-empty context

linearizeParameters

public LinkedList<BNDiffSMParameter> linearizeParameters()
Extracts the BNDiffSMParameters from the leaves of this tree in left-to-right order (as specified by the order of the alphabet) and returns them as a LinkedList.

Returns:
the BNDiffSMParameters from the leaves in linear order

isLeaf

public boolean isLeaf()
Indicates if the random variable of this BNDiffSMParameterTree is a leaf, i.e. it has no children in the network structure of the enclosing BayesianNetworkDiffSM.

Returns:
true if this tree is a leaf, false otherwise

getNumberOfParents

public int getNumberOfParents()
Returns the number of parents for the random variable of this BNDiffSMParameterTree in the network structure of the enclosing BayesianNetworkDiffSM. This corresponds to the length of the context or the depth of the tree.

Returns:
the number of parents

print

public void print()
Prints the structure of this tree.


getParameterFor

public BNDiffSMParameter getParameterFor(Sequence seq,
                                         int start)
Returns the BNDiffSMParameter that is responsible for the suffix of sequence seq starting at position start.

Parameters:
seq - the Sequence
start - the first position in the suffix
Returns:
the BNDiffSMParameter that is responsible for the suffix

setParameterFor

public void setParameterFor(int symbol,
                            int[][] context,
                            BNDiffSMParameter par)
Sets the instance of the BNDiffSMParameter for symbol symbol and context context to BNDiffSMParameter par.

Parameters:
symbol - the symbol
context - the context
par - the new BNDiffSMParameter instance

invalidateNormalizers

public void invalidateNormalizers()
Resets all pre-computed normalization constants.


forward

public double forward(BNDiffSMParameterTree[] trees)
               throws RuntimeException
Computes the forward-part of the normalization constant starting from this BNDiffSMParameterTree. This is only possible for roots, i.e. BNDiffSMParameterTrees that do not have parents in the network structure of the enclosing BayesianNetworkDiffSM.

Parameters:
trees - the array of all trees as from the enclosing BayesianNetworkDiffSM
Returns:
the forward-part of the normalization constant
Throws:
RuntimeException - if this BNDiffSMParameterTree is not a root

backward

public void backward(BNDiffSMParameterTree[] trees,
                     int[][] order)
              throws RuntimeException
Computes the backward-part of the normalization constant starting from this BNDiffSMParameterTree. This is only possible for leaves, i.e. BNDiffSMParameterTrees that do not have children in the network structure of the enclosing BayesianNetworkDiffSM.

Parameters:
trees - the array of all trees as from the enclosing BayesianNetworkDiffSM
order - the topological ordering as returned by TopSort.getTopologicalOrder(int[][])
Throws:
RuntimeException - if this BNDiffSMParameterTree is not a leaf

addCount

public void addCount(Sequence seq,
                     int start,
                     double count)
Adds count to the parameter as returned by getParameterFor(Sequence, int).

Parameters:
seq - the sequence
start - the first position of the suffix of seq
count - the added count

normalizePlugInParameters

public void normalizePlugInParameters()
Starts the normalization of the plug-in parameters to the logarithm of the MAP-estimates.


normalizeParameters

public void normalizeParameters()
Normalizes the parameter values to the corresponding log-probabilities. After this step, BayesianNetworkDiffSM.getLogNormalizationConstant() should return 0.


divideByUnfree

public void divideByUnfree()
Divides each of the normalized parameters on a simplex by the last BNDiffSMParameter, which is defined not to be free. In the log-space this amounts to subtracting the value of the last BNDiffSMParameter from all BNDiffSMParameters on the simplex.


toXML

public StringBuffer toXML()
Works as defined in Storable. Returns an XML representation of this BNDiffSMParameterTree.

Specified by:
toXML in interface Storable
Returns:
the XML representation of this BNDiffSMParameterTree
See Also:
Storable.toXML()

getFirstParent

public int getFirstParent()
Returns the first parent of the random variable of this BNDiffSMParameterTree in the topological ordering of the network structure of the enclosing BayesianNetworkDiffSM.

Returns:
the first parent

drawKLDivergences

public void drawKLDivergences(double weight,
                              double[] kls,
                              int startIdx,
                              int endIdx,
                              double[][][] contrast,
                              double samples)
Draws KL-divergences between the distribution given by contrast and endIdx-startIdx distributions drawn from a Dirichlet density centered around contrast, i.e. the hyper-parameters of the Dirichlet density are the probabilities of contrast weighted by samples. The drawn KL-divergences are stored into kls between startIndex and endIndex (exclusive).

Parameters:
weight - a weight on the KL-divergences
kls - the array of KL-divergences which is filled
startIdx - the first index
endIdx - the index after the last index
contrast - the distribution to check against
samples - number of sequences + equivalent sample size

getKLDivergence

public double getKLDivergence(double[][][] ds)
Returns the KL-divergence of the distribution of this BNDiffSMParameterTree and the distribution given by ds.

Parameters:
ds - the distribution
Returns:
the KL-divergence

getKLDivergence

public double getKLDivergence(double[] weight,
                              double[][][][] distribution)
Returns the KL-divergence of the distribution of this BNDiffSMParameterTree and a number of distribution given by ds and weighted by weight

Parameters:
distribution - the distribution
weight - the weights on the distributions
Returns:
the KL-divergence

drawKLDivergences

public void drawKLDivergences(double[] kls,
                              double[] weights,
                              double[][][][] contrast,
                              double samples)
Draws KL-divergences between the distributions given by contrast[i] each weighted by weights[i] kls.length distributions drawn from a Dirichlet density centered around contrast, i.e. the hyper-parameters of the Dirichlet density are the probabilities of contrast weighted by samples. The drawn KL-divergences are added to the entries of kls.

Parameters:
kls - the array of KL-divergences which is filled
weights - the weights on the distributions in contrast
contrast - the distribution to check against
samples - number of sequences + equivalent sample size

fill

public void fill(double[] weight,
                 double[][][][] distribution)
Fills all parameters with the probabilities given in distribution.

Parameters:
distribution - the distribution
weight - the weights on the distributions

copy

public void copy(BNDiffSMParameterTree parameterTree)
Copies the values of the parameters from another BNDiffSMParameterTree.

Parameters:
parameterTree - the template

initializeRandomly

public void initializeRandomly(double ess)
Initializes the parameters of this BNDiffSMParameterTree randomly.

Parameters:
ess - the equivalent sample size

computeGammaNorm

public Double computeGammaNorm()
Computes the Gamma-normalization for the prior.

Returns:
the Gamma-normalization

getProbFor

public double getProbFor(Sequence sequence)
Returns the probability of Sequence sequence in this BNDiffSMParameterTree.

Parameters:
sequence - the sequence
Returns:
the probability

getParameterIndexesForSamplingStep

public int[] getParameterIndexesForSamplingStep(int step,
                                                int offset)
Returns the indexes of the parameters, incremented by offset, that shall be sampled in step step of a grouped sampling process.

Parameters:
step - the step
offset - the offset on the parameter indexes
Returns:
the indexes of the group of parameters

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
Returns the maximal Markov order of this tree.

Returns:
the order

getMaximumScore

public double getMaximumScore()
Returns the maximum score in this tree.

Returns:
the score

toHtml

public String toHtml(NumberFormat nf)
Returns an HTML representation of this tree.

Parameters:
nf - the number format
Returns:
the HTML representation