de.jstacs.scoringFunctions.directedGraphicalModels
Class ParameterTree

java.lang.Object
  extended by de.jstacs.scoringFunctions.directedGraphicalModels.ParameterTree
All Implemented Interfaces:
Storable, Cloneable

public class ParameterTree
extends Object
implements Cloneable, Storable

Class for the tree that represents the context of a Parameter in a BayesianNetworkScoringFunction.

Author:
Jan Grau

Nested Class Summary
 class ParameterTree.TreeElement
          Class for the nodes of a ParameterTree
 
Constructor Summary
ParameterTree(int pos, int[] contextPoss, AlphabetContainer alphabet, int firstParent, int[] firstChildren)
          Creates a new ParameterTree for the parameters at position pos using the parent positions in contextPoss.
ParameterTree(StringBuffer source)
          Recreates a ParameterTree from its XML representation as returned by toXML().
 
Method Summary
 void addCount(Sequence seq, int start, double count)
          Adds count to the parameter as returned by getParameterFor(Sequence, int).
 void backward(ParameterTree[] trees, int[][] order)
          Computes the backward-part of the normalization constant starting from this ParameterTree.
 ParameterTree clone()
           
 Double computeGammaNorm()
          Computes the Gamma-normalization for the prior.
 void copy(ParameterTree parameterTree)
          Copies the values of the parameters from another ParameterTree.
 void divideByUnfree()
          Divides each of the normalized parameters on a simplex by the last Parameter, which is defined not to be free.
 void drawKLDivergences(double[] kls, double[] weights, double[][][][] contrast, double samples)
          Draws KL-divergences between the distributions given by contrast[i] each weighted by weights[i] kls.length distributions drawn from a Dirichlet density centered around contrast, i.e. the hyper-parameters of the Dirichlet density are the probabilities of contrast weighted by samples.
 void drawKLDivergences(double weight, double[] kls, int startIdx, int endIdx, double[][][] contrast, double samples)
          Draws KL-divergences between the distribution given by contrast and endIdx-startIdx distributions drawn from a Dirichlet density centered around contrast, i.e. the hyper-parameters of the Dirichlet density are the probabilities of contrast weighted by samples.
 void fill(double[] weight, double[][][][] distribution)
          Fills all parameters with the probabilities given in distribution.
 double forward(ParameterTree[] trees)
          Computes the forward-part of the normalization constant starting from this ParameterTree.
 int getFirstParent()
          Returns the first parent of the random variable of this ParameterTree in the topological ordering of the network structure of the enclosing BayesianNetworkScoringFunction.
 double getKLDivergence(double[][][] ds)
          Returns the KL-divergence of the distribution of this ParameterTree and the distribution given by ds.
 double getKLDivergence(double[] weight, double[][][][] distribution)
          Returns the KL-divergence of the distribution of this ParameterTree and a number of distribution given by ds and weighted by weight
 int getNumberOfParents()
          Returns the number of parents for the random variable of this ParameterTree in the network structure of the enclosing BayesianNetworkScoringFunction.
 Parameter getParameterFor(Sequence seq, int start)
          Returns the Parameter that is responsible for the suffix of sequence seq starting at position start.
 int[] getParameterIndexesForSamplingStep(int step, int offset)
          Returns the indexes of the parameters, incremented by offset, that shall be sampled in step step of a grouped sampling process.
 double getProbFor(Sequence sequence)
          Returns the probability of Sequence sequence in this ParameterTree.
 void initializeRandomly(double ess)
          Initializes the parameters of this ParameterTree randomly.
 void insertProbs(double[] probs)
          Computes the probabilities for a PWM, i.e. the parameters in the tree have an empty context, and inserts them into probs.
 void invalidateNormalizers()
          Resets all pre-computed normalization constants.
 boolean isLeaf()
          Indicates if the random variable of this ParameterTree is a leaf, i.e. it has no children in the network structure of the enclosing BayesianNetworkScoringFunction.
 LinkedList<Parameter> linearizeParameters()
          Extracts the Parameters from the leaves of this tree in left-to-right order (as specified by the order of the alphabet) and returns them as a LinkedList.
 void normalizeParameters()
          Normalizes the parameter values to the corresponding log-probabilities.
 void normalizePlugInParameters()
          Starts the normalization of the plug-in parameters to the logarithm of the MAP-estimates.
 void print()
          Prints the structure of this tree.
 void setParameterFor(int symbol, int[][] context, Parameter par)
          Sets the instance of the Parameter for symbol symbol and context context to Parameter par.
 String toString()
           
 StringBuffer toXML()
          Works as defined in Storable.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ParameterTree

public ParameterTree(int pos,
                     int[] contextPoss,
                     AlphabetContainer alphabet,
                     int firstParent,
                     int[] firstChildren)
Creates a new ParameterTree for the parameters at position pos using the parent positions in contextPoss. These are used to extract the correct alphabet out of alphabet for every context position. The first parent is the first parent of the random variable at pos as given by the topological ordering of the network structure of the enclosing BayesianNetworkScoringFunction. The first children are the children the random variable at pos is the first parent for.

Parameters:
pos - the position of the random variable of the parameters in the tree
contextPoss - the positions of the context
alphabet - the alphabet of the enclosing BayesianNetworkScoringFunction
firstParent - the first parent of this random variable, or -1 if the random variable has no parent
firstChildren - the first children of this random variable

ParameterTree

public ParameterTree(StringBuffer source)
              throws NonParsableException
Recreates a ParameterTree from its XML representation as returned by toXML(). ParameterTree does not implement the Storable interface to recycle the AlphabetContainer of the enclosing BayesianNetworkScoringFunction, but besides the different constructor works like any implementation of Storable.

Parameters:
source - the XML representation as StringBuffer
Throws:
NonParsableException - if the XML code could not be parsed
Method Detail

clone

public ParameterTree clone()
                    throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException

toString

public String toString()
Overrides:
toString in class Object

insertProbs

public void insertProbs(double[] probs)
                 throws Exception
Computes the probabilities for a PWM, i.e. the parameters in the tree have an empty context, and inserts them into probs. Used by BayesianNetworkScoringFunction.getPWM().

Parameters:
probs - the array to store the probabilities for a PWM
Throws:
Exception - if the tree structure does have a non-empty context

linearizeParameters

public LinkedList<Parameter> linearizeParameters()
Extracts the Parameters from the leaves of this tree in left-to-right order (as specified by the order of the alphabet) and returns them as a LinkedList.

Returns:
the Parameters from the leaves in linear order

isLeaf

public boolean isLeaf()
Indicates if the random variable of this ParameterTree is a leaf, i.e. it has no children in the network structure of the enclosing BayesianNetworkScoringFunction.

Returns:
true if this tree is a leaf, false otherwise

getNumberOfParents

public int getNumberOfParents()
Returns the number of parents for the random variable of this ParameterTree in the network structure of the enclosing BayesianNetworkScoringFunction. This corresponds to the length of the context or the depth of the tree.

Returns:
the number of parents

print

public void print()
Prints the structure of this tree.


getParameterFor

public Parameter getParameterFor(Sequence seq,
                                 int start)
Returns the Parameter that is responsible for the suffix of sequence seq starting at position start.

Parameters:
seq - the Sequence
start - the first position in the suffix
Returns:
the Parameter that is responsible for the suffix

setParameterFor

public void setParameterFor(int symbol,
                            int[][] context,
                            Parameter par)
Sets the instance of the Parameter for symbol symbol and context context to Parameter par.

Parameters:
symbol - the symbol
context - the context
par - the new Parameter instance

invalidateNormalizers

public void invalidateNormalizers()
Resets all pre-computed normalization constants.


forward

public double forward(ParameterTree[] trees)
               throws RuntimeException
Computes the forward-part of the normalization constant starting from this ParameterTree. This is only possible for roots, i.e. ParameterTrees that do not have parents in the network structure of the enclosing BayesianNetworkScoringFunction.

Parameters:
trees - the array of all trees as from the enclosing BayesianNetworkScoringFunction
Returns:
the forward-part of the normalization constant
Throws:
RuntimeException - if this ParameterTree is not a root

backward

public void backward(ParameterTree[] trees,
                     int[][] order)
              throws RuntimeException
Computes the backward-part of the normalization constant starting from this ParameterTree. This is only possible for leaves, i.e. ParameterTrees that do not have children in the network structure of the enclosing BayesianNetworkScoringFunction.

Parameters:
trees - the array of all trees as from the enclosing BayesianNetworkScoringFunction
order - the topological ordering as returned by TopSort.getTopologicalOrder(int[][])
Throws:
RuntimeException - if this ParameterTree is not a leaf

addCount

public void addCount(Sequence seq,
                     int start,
                     double count)
Adds count to the parameter as returned by getParameterFor(Sequence, int).

Parameters:
seq - the sequence
start - the first position of the suffix of seq
count - the added count

normalizePlugInParameters

public void normalizePlugInParameters()
Starts the normalization of the plug-in parameters to the logarithm of the MAP-estimates.


normalizeParameters

public void normalizeParameters()
Normalizes the parameter values to the corresponding log-probabilities. After this step, BayesianNetworkScoringFunction.getLogNormalizationConstant() should return 0.


divideByUnfree

public void divideByUnfree()
Divides each of the normalized parameters on a simplex by the last Parameter, which is defined not to be free. In the log-space this amounts to subtracting the value of the last Parameter from all Parameters on the simplex.


toXML

public StringBuffer toXML()
Works as defined in Storable. Returns an XML representation of this ParameterTree.

Specified by:
toXML in interface Storable
Returns:
the XML representation of this ParameterTree
See Also:
Storable.toXML()

getFirstParent

public int getFirstParent()
Returns the first parent of the random variable of this ParameterTree in the topological ordering of the network structure of the enclosing BayesianNetworkScoringFunction.

Returns:
the first parent

drawKLDivergences

public void drawKLDivergences(double weight,
                              double[] kls,
                              int startIdx,
                              int endIdx,
                              double[][][] contrast,
                              double samples)
Draws KL-divergences between the distribution given by contrast and endIdx-startIdx distributions drawn from a Dirichlet density centered around contrast, i.e. the hyper-parameters of the Dirichlet density are the probabilities of contrast weighted by samples. The drawn KL-divergences are stored into kls between startIndex and endIndex (exclusive).

Parameters:
weight - a weight on the KL-divergences
kls - the array of KL-divergences which is filled
startIdx - the first index
endIdx - the index after the last index
contrast - the distribution to check against
samples - number of sequences + equivalent sample size

getKLDivergence

public double getKLDivergence(double[][][] ds)
Returns the KL-divergence of the distribution of this ParameterTree and the distribution given by ds.

Parameters:
ds - the distribution
Returns:
the KL-divergence

getKLDivergence

public double getKLDivergence(double[] weight,
                              double[][][][] distribution)
Returns the KL-divergence of the distribution of this ParameterTree and a number of distribution given by ds and weighted by weight

Parameters:
distribution - the distribution
weight - the weights on the distributions
Returns:
the KL-divergence

drawKLDivergences

public void drawKLDivergences(double[] kls,
                              double[] weights,
                              double[][][][] contrast,
                              double samples)
Draws KL-divergences between the distributions given by contrast[i] each weighted by weights[i] kls.length distributions drawn from a Dirichlet density centered around contrast, i.e. the hyper-parameters of the Dirichlet density are the probabilities of contrast weighted by samples. The drawn KL-divergences are added to the entries of kls.

Parameters:
kls - the array of KL-divergences which is filled
weights - the weights on the distributions in contrast
contrast - the distribution to check against
samples - number of sequences + equivalent sample size

fill

public void fill(double[] weight,
                 double[][][][] distribution)
Fills all parameters with the probabilities given in distribution.

Parameters:
distribution - the distribution
weight - the weights on the distributions

copy

public void copy(ParameterTree parameterTree)
Copies the values of the parameters from another ParameterTree.

Parameters:
parameterTree - the template

initializeRandomly

public void initializeRandomly(double ess)
Initializes the parameters of this ParameterTree randomly.

Parameters:
ess - the equivalent sample size

computeGammaNorm

public Double computeGammaNorm()
Computes the Gamma-normalization for the prior.

Returns:
the Gamma-normalization

getProbFor

public double getProbFor(Sequence sequence)
Returns the probability of Sequence sequence in this ParameterTree.

Parameters:
sequence - the sequence
Returns:
the probability

getParameterIndexesForSamplingStep

public int[] getParameterIndexesForSamplingStep(int step,
                                                int offset)
Returns the indexes of the parameters, incremented by offset, that shall be sampled in step step of a grouped sampling process.

Parameters:
step - the step
offset - the offset on the parameter indexes
Returns:
the indexes of the group of parameters