de.jstacs.sequenceScores.statisticalModels.trainable.hmm.states.emissions.discrete
Class AbstractConditionalDiscreteEmission

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.hmm.states.emissions.discrete.AbstractConditionalDiscreteEmission
All Implemented Interfaces:
SamplingComponent, SamplingFromStatistic, DifferentiableEmission, Emission, SamplingEmission, Storable, Cloneable
Direct Known Subclasses:
DiscreteEmission, ReferenceSequenceDiscreteEmission

public abstract class AbstractConditionalDiscreteEmission
extends Object
implements SamplingEmission, DifferentiableEmission

The abstract super class of discrete emissions.

Author:
Jens Keilwagen, Michael Scharfe, Jan Grau

Field Summary
protected  AlphabetContainer con
          The alphabet of the emissions
protected  int[] counter
          The counter for the sampling steps of each sampling.
protected  double[] ess
          The equivalent sample sizes for each condition
protected  double[][] grad
          The array for storing the gradients for each parameter
protected  double[][] hyperParams
          The hyper-parameters for the prior on the parameters
protected  double[] logNorm
          The log-normalization constants for each condition
protected  int offset
          The offset of the parameter indexes
protected  double[][] params
          The parameters of the emission
protected  File[] paramsFile
          The files for saving the parameters during the sampling.
protected  double[][] probs
          The parameters transformed to probabilites
protected  BufferedReader reader
          The reader for the paramsFile after a sampling.
protected  int samplingIndex
          The index of the current sampling.
protected  double[][] statistic
          The array for storing the statistics for each parameter
protected  BufferedWriter writer
          The writer for the paramsFile in a sampling.
 
Constructor Summary
protected AbstractConditionalDiscreteEmission(AlphabetContainer con, double[][] hyperParams)
          This is a simple constructor for a AbstractConditionalDiscreteEmission defining the individual hyper parameters.
protected AbstractConditionalDiscreteEmission(AlphabetContainer con, double[][] hyperParams, double[][] initHyperParams)
          This constructor creates a AbstractConditionalDiscreteEmission defining the individual hyper parameters for the prior used during training and initialization.
protected AbstractConditionalDiscreteEmission(AlphabetContainer con, int numberOfConditions, double ess)
          This is a simple constructor for a AbstractConditionalDiscreteEmission based on the equivalent sample size.
protected AbstractConditionalDiscreteEmission(StringBuffer xml)
          Creates a AbstractConditionalDiscreteEmission from its XML representation.
 
Method Summary
 void acceptParameters()
          This methods accepts the drawn parameters.
 void addGradientOfLogPriorTerm(double[] gradient, int offset)
          This method computes the gradient of Emission.getLogPriorTerm() for each parameter of this model.
 void addToStatistic(boolean forward, int startPos, int endPos, double weight, Sequence seq)
          This method adds the weight to the internal sufficient statistic.
protected  void appendFurtherInformation(StringBuffer xml)
          This method appends further information to the XML representation.
 AbstractConditionalDiscreteEmission clone()
           
protected  void drawParameters(double[][] hyper, boolean uniformBackup)
           
 void drawParametersFromStatistic()
          This method draws the parameters using a sufficient statistic representing a posteriori density.
 void estimateFromStatistic()
          This method estimates the parameters from the internal sufficient statistic.
 void extendSampling(int start, boolean append)
          This method allows to extend a sampling.
protected  void extractFurtherInformation(StringBuffer xml)
          This method extracts further information from the XML representation.
 void fillCurrentParameter(double[] params)
          Fills the current parameters in the global code>params array using the internal offset.
 void fillSamplingGroups(int parameterOffset, LinkedList<int[]> list)
          Adds the groups of indexes of those parameters of this emission that should be sampled together in one step of a grouped sampling procedure, each as an int[], into list.
protected  void finalize()
           
protected  void fromXML(StringBuffer xml)
          This method is internally used by the constructor AbstractConditionalDiscreteEmission(StringBuffer).
 AlphabetContainer getAlphabetContainer()
          This method returns the AlphabetContainer of this emission.
protected abstract  int getConditionIndex(boolean forward, int seqPos, Sequence seq)
          This method returns an index encoding the condition.
protected static double[][] getHyperParams(double ess, int numConditions, int numEmissions)
          Returns the hyper-parameters for all parameters and a given ess.
 double getLogGammaScoreFromStatistic()
          This method calculates a score for the current statistics, which is independent from the current parameters In general the gamma-score is a product of gamma-functions parameterized with the current statistics
 double getLogPosteriorFromStatistic()
          This method calculates the a-posteriori probability for the current statistics
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double getLogProbAndPartialDerivationFor(boolean forward, int startPos, int endPos, IntList indices, DoubleList partDer, Sequence seq)
          Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.
 double getLogProbFor(boolean forward, int startPos, int endPos, Sequence seq)
          This method computes the logarithm of the likelihood.
 String getNodeLabel(double weight, String name, NumberFormat nf)
          Returns the graphviz label of the node containing this emission.
 String getNodeShape(boolean forward)
          Returns the graphviz string for the shape of the node.
 int getNumberOfParameters()
          Returns the number of parameters of this emission.
 int getSizeOfEventSpace()
          Returns the size of the event space, i.e., the number of possible outcomes, for the random variables of this emission
 void initForSampling(int starts)
          This method initializes the instance for the sampling.
 void initializeFunctionRandomly()
          This method initializes the emission randomly.
 boolean isInSamplingMode()
          This method returns true if the object is currently used in a sampling, otherwise false.
 void joinStatistics(Emission... emissions)
          This method joins the statistics of different instances and sets this joined statistic as statistic of each instance.
 boolean parseNextParameterSet()
          This method allows the user to parse the next set of parameters (from a file).
 boolean parseParameterSet(int start, int n)
          This method allows the user to parse the set of parameters with index n of a certain sampling (from a file).
protected  void precompute()
          This method precomputes some normalization constant and probabilities.
 void resetStatistic()
          This method resets the internal sufficient statistic.
 void samplingStopped()
          This method is the opposite of the method SamplingComponent.extendSampling(int, boolean).
 void setLinear(boolean linear)
          If set to true, the probabilities are mapped to colors by directly, otherwise a logistic mapping is used to emphasize deviations from the uniform distribution.
 void setParameter(double[] params, int offset)
          This method sets the internal parameters using the given global parameter array, the global offset of the HMM and the internal offset.
 int setParameterOffset(int offset)
          This method sets the internal parameter offset and returns the new parameter offset for further use.
 void setParameters(Emission t)
          Set values of parameters of the instance to the value of the parameters of the given instance.
 void setShape(String shape)
          Sets the graphviz shape of the node that uses this emission to some non-standard value (standard is "house").
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.trainable.hmm.states.emissions.Emission
toString
 

Field Detail

paramsFile

protected File[] paramsFile
The files for saving the parameters during the sampling.


counter

protected int[] counter
The counter for the sampling steps of each sampling.


samplingIndex

protected int samplingIndex
The index of the current sampling.


writer

protected BufferedWriter writer
The writer for the paramsFile in a sampling.


reader

protected BufferedReader reader
The reader for the paramsFile after a sampling.


offset

protected int offset
The offset of the parameter indexes


con

protected AlphabetContainer con
The alphabet of the emissions


params

protected double[][] params
The parameters of the emission


probs

protected double[][] probs
The parameters transformed to probabilites


hyperParams

protected double[][] hyperParams
The hyper-parameters for the prior on the parameters


statistic

protected double[][] statistic
The array for storing the statistics for each parameter


grad

protected double[][] grad
The array for storing the gradients for each parameter


logNorm

protected double[] logNorm
The log-normalization constants for each condition


ess

protected double[] ess
The equivalent sample sizes for each condition

Constructor Detail

AbstractConditionalDiscreteEmission

protected AbstractConditionalDiscreteEmission(AlphabetContainer con,
                                              int numberOfConditions,
                                              double ess)
This is a simple constructor for a AbstractConditionalDiscreteEmission based on the equivalent sample size.

Parameters:
con - the AlphabetContainer of this emission
numberOfConditions - the number of conditions
ess - the equivalent sample size (ess) of this emission that is equally distributed over all parameters
See Also:
AbstractConditionalDiscreteEmission(AlphabetContainer, double[][])

AbstractConditionalDiscreteEmission

protected AbstractConditionalDiscreteEmission(AlphabetContainer con,
                                              double[][] hyperParams)
This is a simple constructor for a AbstractConditionalDiscreteEmission defining the individual hyper parameters.

Parameters:
con - the AlphabetContainer of this emission
hyperParams - the individual hyper parameters for each parameter
See Also:
AbstractConditionalDiscreteEmission(AlphabetContainer, double[][])

AbstractConditionalDiscreteEmission

protected AbstractConditionalDiscreteEmission(AlphabetContainer con,
                                              double[][] hyperParams,
                                              double[][] initHyperParams)
This constructor creates a AbstractConditionalDiscreteEmission defining the individual hyper parameters for the prior used during training and initialization.

Parameters:
con - the AlphabetContainer of this emission
hyperParams - the individual hyper parameters for each parameter (used during training)
initHyperParams - the individual hyper parameters for each parameter used in initializeFunctionRandomly()

AbstractConditionalDiscreteEmission

protected AbstractConditionalDiscreteEmission(StringBuffer xml)
                                       throws NonParsableException
Creates a AbstractConditionalDiscreteEmission from its XML representation.

Parameters:
xml - the XML representation.
Throws:
NonParsableException - if the XML representation could not be parsed
Method Detail

getHyperParams

protected static double[][] getHyperParams(double ess,
                                           int numConditions,
                                           int numEmissions)
Returns the hyper-parameters for all parameters and a given ess. The equivalent sample size is distributed evenly across all parameters

Parameters:
ess - the equivalent sample size
numConditions - the number of conditions
numEmissions - the number of emissions, assumed to be equal for all conditions
Returns:
hyper-parameters for all parameters

clone

public AbstractConditionalDiscreteEmission clone()
                                          throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException

setShape

public void setShape(String shape)
Sets the graphviz shape of the node that uses this emission to some non-standard value (standard is "house").

Parameters:
shape - the shape of the node

addGradientOfLogPriorTerm

public void addGradientOfLogPriorTerm(double[] gradient,
                                      int offset)
Description copied from interface: DifferentiableEmission
This method computes the gradient of Emission.getLogPriorTerm() for each parameter of this model. The results are added to the array grad beginning at index (offset + internal offset).

Specified by:
addGradientOfLogPriorTerm in interface DifferentiableEmission
Parameters:
gradient - the array of gradients
offset - the start index of the HMM in the grad array, where the partial derivations for the parameters of the HMM shall be entered
See Also:
Emission.getLogPriorTerm(), DifferentiableEmission.setParameterOffset(int)

getLogPriorTerm

public double getLogPriorTerm()
Description copied from interface: Emission
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Specified by:
getLogPriorTerm in interface Emission
Returns:
a value that is proportional to the log of the prior
See Also:
StatisticalModel.getLogPriorTerm()

getLogProbAndPartialDerivationFor

public double getLogProbAndPartialDerivationFor(boolean forward,
                                                int startPos,
                                                int endPos,
                                                IntList indices,
                                                DoubleList partDer,
                                                Sequence seq)
                                         throws OperationNotSupportedException
Description copied from interface: DifferentiableEmission
Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.

Specified by:
getLogProbAndPartialDerivationFor in interface DifferentiableEmission
Parameters:
forward - a switch whether to use the forward or the reverse complementary strand of the sequence
startPos - the start position in the Sequence
endPos - the end position in the Sequence
indices - an IntList of indices, after method invocation the list should contain the indices i where $\frac{\partial \log score(seq)}{\partial \lambda_i}$ is not zero
partDer - a DoubleList of partial derivations, after method invocation the list should contain the corresponding $\frac{\partial \log score(seq)}{\partial \lambda_i}$ that are not zero
seq - the Sequence
Returns:
the logarithmic score for the Sequence
Throws:
OperationNotSupportedException - if forward==false and the reverse complement of the sequence can not be computed

getLogProbFor

public double getLogProbFor(boolean forward,
                            int startPos,
                            int endPos,
                            Sequence seq)
                     throws OperationNotSupportedException
Description copied from interface: Emission
This method computes the logarithm of the likelihood.

Specified by:
getLogProbFor in interface Emission
Parameters:
forward - whether to use the forward or the reverse strand
startPos - the start position
endPos - the end position
seq - the sequence
Returns:
the logarithm of the probability
Throws:
OperationNotSupportedException - if forward=false and the reverse complement of the sequence seq is not defined

initializeFunctionRandomly

public void initializeFunctionRandomly()
Description copied from interface: Emission
This method initializes the emission randomly.

Specified by:
initializeFunctionRandomly in interface Emission

precompute

protected void precompute()
This method precomputes some normalization constant and probabilities.

See Also:
logNorm, probs

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Specified by:
toXML in interface Storable
Returns:
the XML representation

appendFurtherInformation

protected void appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation. It allows subclasses to save further parameters that are not defined in the superclass.

Parameters:
xml - the XML representation

fromXML

protected void fromXML(StringBuffer xml)
                throws NonParsableException
This method is internally used by the constructor AbstractConditionalDiscreteEmission(StringBuffer).

Parameters:
xml - the StringBuffer containing the xml representation of an instance
Throws:
NonParsableException - if the StringBuffer is not parsable
See Also:
AbstractConditionalDiscreteEmission(StringBuffer)

extractFurtherInformation

protected void extractFurtherInformation(StringBuffer xml)
                                  throws NonParsableException
This method extracts further information from the XML representation. It allows subclasses to cast further parameters that are not defined in the superclass.

Parameters:
xml - the XML representation
Throws:
NonParsableException - if the information could not be reconstructed out of the StringBuffer xml

joinStatistics

public void joinStatistics(Emission... emissions)
Description copied from interface: Emission
This method joins the statistics of different instances and sets this joined statistic as statistic of each instance. This method might be used for instance in a multi-threaded optimization to join partial statistics.

Specified by:
joinStatistics in interface Emission
Parameters:
emissions - the emissions to be joined

addToStatistic

public void addToStatistic(boolean forward,
                           int startPos,
                           int endPos,
                           double weight,
                           Sequence seq)
                    throws OperationNotSupportedException
Description copied from interface: Emission
This method adds the weight to the internal sufficient statistic.

Specified by:
addToStatistic in interface Emission
Parameters:
forward - whether to use the forward or the reverse strand
startPos - the start position
endPos - the end position
weight - the weight of the sequence
seq - the sequence
Throws:
OperationNotSupportedException - if forward=false and the reverse complement of the sequence seq is not defined

getConditionIndex

protected abstract int getConditionIndex(boolean forward,
                                         int seqPos,
                                         Sequence seq)
This method returns an index encoding the condition.

Parameters:
forward - a switch to decide whether to use the forward or the reverse complementary strand (e.g. for DNA sequences)
seqPos - the position in the sequence seq
seq - the sequence
Returns:
the index encoding the condition

estimateFromStatistic

public void estimateFromStatistic()
Description copied from interface: Emission
This method estimates the parameters from the internal sufficient statistic.

Specified by:
estimateFromStatistic in interface Emission

resetStatistic

public void resetStatistic()
Description copied from interface: Emission
This method resets the internal sufficient statistic.

Specified by:
resetStatistic in interface Emission

setParameter

public void setParameter(double[] params,
                         int offset)
Description copied from interface: DifferentiableEmission
This method sets the internal parameters using the given global parameter array, the global offset of the HMM and the internal offset.

Specified by:
setParameter in interface DifferentiableEmission
Parameters:
params - the global parameter array of the classifier
offset - the offset of the HMM
See Also:
DifferentiableEmission.setParameterOffset(int)

getAlphabetContainer

public AlphabetContainer getAlphabetContainer()
Description copied from interface: Emission
This method returns the AlphabetContainer of this emission.

Specified by:
getAlphabetContainer in interface Emission
Returns:
the AlphabetContainer of this emission

fillCurrentParameter

public void fillCurrentParameter(double[] params)
Description copied from interface: DifferentiableEmission
Fills the current parameters in the global code>params array using the internal offset.

Specified by:
fillCurrentParameter in interface DifferentiableEmission
Parameters:
params - the global parameter array of the HMM
See Also:
DifferentiableEmission.setParameterOffset(int)

setParameterOffset

public int setParameterOffset(int offset)
Description copied from interface: DifferentiableEmission
This method sets the internal parameter offset and returns the new parameter offset for further use.

Specified by:
setParameterOffset in interface DifferentiableEmission
Parameters:
offset - the offset to be set
Returns:
the new parameter offset

drawParameters

protected void drawParameters(double[][] hyper,
                              boolean uniformBackup)

drawParametersFromStatistic

public void drawParametersFromStatistic()
Description copied from interface: SamplingFromStatistic
This method draws the parameters using a sufficient statistic representing a posteriori density. It is recommended to write the parameters to a specific file using SamplingComponent.acceptParameters() so that they can later be parsed using the methods of the interface.

Before using this method the method SamplingComponent.initForSampling(int) should be called.

Specified by:
drawParametersFromStatistic in interface SamplingFromStatistic
See Also:
SamplingComponent.initForSampling(int), SamplingComponent.acceptParameters()

getLogGammaScoreFromStatistic

public double getLogGammaScoreFromStatistic()
Description copied from interface: SamplingEmission
This method calculates a score for the current statistics, which is independent from the current parameters In general the gamma-score is a product of gamma-functions parameterized with the current statistics

Specified by:
getLogGammaScoreFromStatistic in interface SamplingEmission
Returns:
the logarithm of the gamma-score for the current statistics

acceptParameters

public void acceptParameters()
                      throws IOException
Description copied from interface: SamplingComponent
This methods accepts the drawn parameters. Internally the drawn parameters should be saved (to a file).

Specified by:
acceptParameters in interface SamplingComponent
Throws:
IOException - if the file could not be handled correctly

getLogPosteriorFromStatistic

public double getLogPosteriorFromStatistic()
Description copied from interface: SamplingFromStatistic
This method calculates the a-posteriori probability for the current statistics

Specified by:
getLogPosteriorFromStatistic in interface SamplingFromStatistic
Returns:
the logarithm of the a-posteriori probability

extendSampling

public void extendSampling(int start,
                           boolean append)
                    throws IOException
Description copied from interface: SamplingComponent
This method allows to extend a sampling.

Specified by:
extendSampling in interface SamplingComponent
Parameters:
start - the index of the sampling
append - whether to append the sampled parameters to an existing file or to overwrite the file
Throws:
IOException - if the file could not be handled correctly

initForSampling

public void initForSampling(int starts)
                     throws IOException
Description copied from interface: SamplingComponent
This method initializes the instance for the sampling. For instance this method can be used to create new files where all parameter sets are stored.

Specified by:
initForSampling in interface SamplingComponent
Parameters:
starts - the number of different sampling starts that will be done
Throws:
IOException - if something went wrong
See Also:
File.createTempFile(String, String, java.io.File )

isInSamplingMode

public boolean isInSamplingMode()
Description copied from interface: SamplingComponent
This method returns true if the object is currently used in a sampling, otherwise false.

Specified by:
isInSamplingMode in interface SamplingComponent
Returns:
true if the object is currently used in a sampling, otherwise false

parseNextParameterSet

public boolean parseNextParameterSet()
Description copied from interface: SamplingComponent
This method allows the user to parse the next set of parameters (from a file).

Specified by:
parseNextParameterSet in interface SamplingComponent
Returns:
true if the parameters could be parsed, otherwise false
See Also:
SamplingComponent.parseParameterSet(int, int)

parseParameterSet

public boolean parseParameterSet(int start,
                                 int n)
                          throws IOException
Description copied from interface: SamplingComponent
This method allows the user to parse the set of parameters with index n of a certain sampling (from a file). The internal numbering should start with 0. The parameter set with index 0 is the initial (random) parameter set. It is recommended that a series of parameter sets is accessed by the following lines:

for( sampling = 0; sampling < numSampling; sampling++ )
{

boolean b = parseParameterSet( sampling, n );
while( b )
{
//do something
b = parseNextParameterSet();
}
}

Specified by:
parseParameterSet in interface SamplingComponent
Parameters:
start - the index of the sampling
n - the index of the parameter set
Returns:
true if the parameter set could be parsed
Throws:
IOException
See Also:
SamplingComponent.parseNextParameterSet()

samplingStopped

public void samplingStopped()
                     throws IOException
Description copied from interface: SamplingComponent
This method is the opposite of the method SamplingComponent.extendSampling(int, boolean). It can be used for closing any streams of writer, ...

Specified by:
samplingStopped in interface SamplingComponent
Throws:
IOException - if something went wrong
See Also:
SamplingComponent.extendSampling(int, boolean)

finalize

protected void finalize()
                 throws Throwable
Overrides:
finalize in class Object
Throws:
Throwable

getNodeShape

public String getNodeShape(boolean forward)
Description copied from interface: Emission
Returns the graphviz string for the shape of the node.

Specified by:
getNodeShape in interface Emission
Parameters:
forward - if this emission is used on the forward strand
Returns:
the shape

getNodeLabel

public String getNodeLabel(double weight,
                           String name,
                           NumberFormat nf)
Description copied from interface: Emission
Returns the graphviz label of the node containing this emission.

Specified by:
getNodeLabel in interface Emission
Parameters:
weight - the weight of the node which is represented by the color of the node, or -1 for no representation, i.e., white background
name - the name of the state using this emission
nf - the NumberFormat for formatting the textual representation of this emission
Returns:
the label

setLinear

public void setLinear(boolean linear)
If set to true, the probabilities are mapped to colors by directly, otherwise a logistic mapping is used to emphasize deviations from the uniform distribution.

Parameters:
linear - map probabilities linear

fillSamplingGroups

public void fillSamplingGroups(int parameterOffset,
                               LinkedList<int[]> list)
Description copied from interface: DifferentiableEmission
Adds the groups of indexes of those parameters of this emission that should be sampled together in one step of a grouped sampling procedure, each as an int[], into list. In most cases, one group should contain the parameters that are living on a common simplex. The internal indexes of the parameters are incremeneted by an external parameterOffset

Specified by:
fillSamplingGroups in interface DifferentiableEmission
Parameters:
parameterOffset - the external parameter offset
list - the list of sampling groups

getNumberOfParameters

public int getNumberOfParameters()
Description copied from interface: DifferentiableEmission
Returns the number of parameters of this emission.

Specified by:
getNumberOfParameters in interface DifferentiableEmission
Returns:
the number of parameters

getSizeOfEventSpace

public int getSizeOfEventSpace()
Description copied from interface: DifferentiableEmission
Returns the size of the event space, i.e., the number of possible outcomes, for the random variables of this emission

Specified by:
getSizeOfEventSpace in interface DifferentiableEmission
Returns:
the size of the event space

setParameters

public void setParameters(Emission t)
                   throws IllegalArgumentException
Description copied from interface: Emission
Set values of parameters of the instance to the value of the parameters of the given instance. It can be assumed that the given instance and the current instance are from the same class. This method might be used for instance in a multi-threaded optimization to broadcast the parameters.

Specified by:
setParameters in interface Emission
Parameters:
t - the emission with the parameters to be set
Throws:
IllegalArgumentException - if the assumption about the same class for given and current instance is wrong