de.jstacs.sequenceScores.statisticalModels.differentiable
Class CyclicMarkovModelDiffSM

java.lang.Object
  extended by de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
      extended by de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractDifferentiableStatisticalModel
          extended by de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractVariableLengthDiffSM
              extended by de.jstacs.sequenceScores.statisticalModels.differentiable.CyclicMarkovModelDiffSM
All Implemented Interfaces:
DifferentiableSequenceScore, SequenceScore, DifferentiableStatisticalModel, SamplingDifferentiableStatisticalModel, VariableLengthDiffSM, StatisticalModel, Storable, Cloneable

public class CyclicMarkovModelDiffSM
extends AbstractVariableLengthDiffSM
implements SamplingDifferentiableStatisticalModel

This scoring function implements a cyclic Markov model of arbitrary order and periodicity for any sequence length. The scoring function uses the parametrization of Meila.

Author:
Jens Keilwagen

Field Summary
 
Fields inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
alphabets, length, r
 
Fields inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
UNKNOWN
 
Constructor Summary
CyclicMarkovModelDiffSM(AlphabetContainer alphabets, double[] frameHyper, double[][][] hyper, boolean plugIn, boolean optimize, int starts, int initFrame)
          This constructor allows to create an instance with specific hyper-parameters for all conditional distributions.
CyclicMarkovModelDiffSM(AlphabetContainer alphabets, int order, int period, double classEss, double[] sumOfHyperParams, boolean plugIn, boolean optimize, int starts, int initFrame)
          The main constructor.
CyclicMarkovModelDiffSM(StringBuffer source)
          This is the constructor for Storable.
 
Method Summary
 void addGradientOfLogPriorTerm(double[] grad, int start)
          This method computes the gradient of DifferentiableStatisticalModel.getLogPriorTerm() for each parameter of this model.
 CyclicMarkovModelDiffSM clone()
          Creates a clone (deep copy) of the current DifferentiableSequenceScore instance.
protected  void fromXML(StringBuffer xml)
          This method is called in the constructor for the Storable interface to create a scoring function from a StringBuffer.
 double[] getCurrentParameterValues()
          Returns a double array of dimension DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values.
 double getESS()
          Returns the equivalent sample size (ess) of this model, i.e.
static double[][][] getHyperParams(int alphabetSize, int length, double ess, double[] frameProb, double[][][] prob)
          This method returns the hyper-parameters for a model given some a-priori probabilities.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 double getLogNormalizationConstant(int length)
          This method returns the logarithm of the normalization constant for a given sequence length.
 double getLogPartialNormalizationConstant(int parameterIndex, int length)
          This method returns the logarithm of the partial normalization constant for a given parameter index and a sequence length.
 double getLogPriorTerm()
          This method computes a value that is proportional to
 double getLogScoreAndPartialDerivation(Sequence seq, int start, int end, IntList indices, DoubleList dList)
          Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.
 double getLogScoreFor(Sequence seq, int start, int end)
          Returns the logarithmic score for the Sequence seq beginning at position start in the Sequence.
 int getNumberOfParameters()
          Returns the number of parameters in this DifferentiableSequenceScore.
 int getNumberOfRecommendedStarts()
          This method returns the number of recommended optimization starts.
 int[][] getSamplingGroups(int parameterOffset)
          Returns groups of indexes of parameters that shall be drawn together in a sampling procedure
 int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
          Returns the size of the event space of the random variables that are affected by parameter no.
 void initializeFunction(int index, boolean freeParams, DataSet[] data, double[][] weights)
          This method creates the underlying structure of the DifferentiableSequenceScore.
 void initializeFunctionRandomly(boolean freeParams)
          This method initializes the DifferentiableSequenceScore randomly.
 boolean isInitialized()
          This method can be used to determine whether the instance is initialized.
 boolean isNormalized()
          This method indicates whether the implemented score is already normalized to 1 or not.
 void setFrameParameterOptimization(boolean optimize)
          This method enables the user to choose whether the frame parameters should be optimized or not.
 void setParameterOptimization(boolean optimize)
          This method enables the user to choose whether the parameters should be optimized or not.
 void setParameters(double[] params, int start)
          This method sets the internal parameters to the values of params between start and start + DifferentiableSequenceScore.getNumberOfParameters() - 1
 void setStatisticForHyperparameters(int[] length, double[] weight)
          This method sets the hyperparameters for the model parameters by evaluating the given statistic.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractVariableLengthDiffSM
getLogNormalizationConstant, getLogPartialNormalizationConstant, getLogScoreAndPartialDerivation, getLogScoreFor
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractDifferentiableStatisticalModel
emitDataSet, getInitialClassParam, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, isNormalized
 
Methods inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
getAlphabetContainer, getCharacteristics, getLength, getLogScoreAndPartialDerivation, getLogScoreFor, getNumberOfStarts, getNumericalCharacteristics
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.differentiable.DifferentiableStatisticalModel
getLogNormalizationConstant, getLogPartialNormalizationConstant
 
Methods inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
getInitialClassParam, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation
 
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.StatisticalModel
emitDataSet, getLogProbFor, getLogProbFor, getLogProbFor, getMaximalMarkovOrder
 
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore
getAlphabetContainer, getCharacteristics, getLength, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getNumericalCharacteristics
 

Constructor Detail

CyclicMarkovModelDiffSM

public CyclicMarkovModelDiffSM(AlphabetContainer alphabets,
                               int order,
                               int period,
                               double classEss,
                               double[] sumOfHyperParams,
                               boolean plugIn,
                               boolean optimize,
                               int starts,
                               int initFrame)
The main constructor.

Parameters:
alphabets - the alphabet container
order - the oder of the model (has to be non-negative)
period - the period
classEss - the ess of the class
sumOfHyperParams - the sum of the hyper parameter for each order (length has to be order+1, each entry has to be non-negative), the sum also sums over the period
plugIn - a switch which enables to used the MAP-parameters as plug-in parameters
optimize - a switch which enables to optimize or fix the parameters
starts - the number of recommended starts
initFrame - the frame which should be used for plug-in initialization, negative for random initialization
See Also:
getHyperParams(int, int, double, double[], double[][][]), CyclicMarkovModelDiffSM(AlphabetContainer, double[], double[][][], boolean, boolean, int, int)

CyclicMarkovModelDiffSM

public CyclicMarkovModelDiffSM(AlphabetContainer alphabets,
                               double[] frameHyper,
                               double[][][] hyper,
                               boolean plugIn,
                               boolean optimize,
                               int starts,
                               int initFrame)
This constructor allows to create an instance with specific hyper-parameters for all conditional distributions.

Parameters:
alphabets - the alphabet container
frameHyper - the hyper-parameters for the frame, the length of this array also defines the period of the model
hyper - the hyper-parameters for each frame
plugIn - a switch which enables to used the MAP-parameters as plug-in parameters
optimize - a switch which enables to optimize or fix the parameters
starts - the number of recommended starts
initFrame - the frame which should be used for plug-in initialization, negative for random initialization

CyclicMarkovModelDiffSM

public CyclicMarkovModelDiffSM(StringBuffer source)
                        throws NonParsableException
This is the constructor for Storable.

Parameters:
source - the xml representation
Throws:
NonParsableException - if the representation could not be parsed.
Method Detail

getHyperParams

public static double[][][] getHyperParams(int alphabetSize,
                                          int length,
                                          double ess,
                                          double[] frameProb,
                                          double[][][] prob)
This method returns the hyper-parameters for a model given some a-priori probabilities.

Parameters:
alphabetSize - the size of the alphabet
length - the expected sequence length
ess - the equivalent sample size (ess) of the model
frameProb - the a-priori probabilities for each frame
prob - the a-priori probabilities for each frame and order
Returns:
specific hyper-parameters

clone

public CyclicMarkovModelDiffSM clone()
                              throws CloneNotSupportedException
Description copied from interface: DifferentiableSequenceScore
Creates a clone (deep copy) of the current DifferentiableSequenceScore instance.

Specified by:
clone in interface DifferentiableSequenceScore
Specified by:
clone in interface SequenceScore
Overrides:
clone in class AbstractDifferentiableStatisticalModel
Returns:
the cloned instance of the current DifferentiableSequenceScore
Throws:
CloneNotSupportedException - if something went wrong while cloning the DifferentiableSequenceScore

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Specified by:
getInstanceName in interface SequenceScore
Returns:
a short instance name

getLogScoreFor

public double getLogScoreFor(Sequence seq,
                             int start,
                             int end)
Description copied from interface: SequenceScore
Returns the logarithmic score for the Sequence seq beginning at position start in the Sequence.

Specified by:
getLogScoreFor in interface SequenceScore
Specified by:
getLogScoreFor in interface VariableLengthDiffSM
Specified by:
getLogScoreFor in class AbstractVariableLengthDiffSM
Parameters:
seq - the Sequence
start - the start position in the Sequence
end - the end position (inclusive) in the Sequence
Returns:
the logarithmic score for the Sequence

getLogScoreAndPartialDerivation

public double getLogScoreAndPartialDerivation(Sequence seq,
                                              int start,
                                              int end,
                                              IntList indices,
                                              DoubleList dList)
Description copied from interface: DifferentiableSequenceScore
Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.

Specified by:
getLogScoreAndPartialDerivation in interface DifferentiableSequenceScore
Specified by:
getLogScoreAndPartialDerivation in interface VariableLengthDiffSM
Specified by:
getLogScoreAndPartialDerivation in class AbstractVariableLengthDiffSM
Parameters:
seq - the Sequence
start - the start position in the Sequence
end - the end position (inclusive) in the Sequence
indices - an IntList of indices, after method invocation the list should contain the indices i where $\frac{\partial \log score(seq)}{\partial \lambda_i}$ is not zero
dList - a DoubleList of partial derivations, after method invocation the list should contain the corresponding $\frac{\partial \log score(seq)}{\partial \lambda_i}$ that are not zero
Returns:
the logarithmic score for the Sequence

getNumberOfParameters

public int getNumberOfParameters()
Description copied from interface: DifferentiableSequenceScore
Returns the number of parameters in this DifferentiableSequenceScore. If the number of parameters is not known yet, the method returns DifferentiableSequenceScore.UNKNOWN.

Specified by:
getNumberOfParameters in interface DifferentiableSequenceScore
Returns:
the number of parameters in this DifferentiableSequenceScore
See Also:
DifferentiableSequenceScore.UNKNOWN

setParameters

public void setParameters(double[] params,
                          int start)
Description copied from interface: DifferentiableSequenceScore
This method sets the internal parameters to the values of params between start and start + DifferentiableSequenceScore.getNumberOfParameters() - 1

Specified by:
setParameters in interface DifferentiableSequenceScore
Parameters:
params - the new parameters
start - the start index in params

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Specified by:
toXML in interface Storable
Returns:
the XML representation

getCurrentParameterValues

public double[] getCurrentParameterValues()
Description copied from interface: DifferentiableSequenceScore
Returns a double array of dimension DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values. If one likes to use these parameters to start an optimization it is highly recommended to invoke DifferentiableSequenceScore.initializeFunction(int, boolean, DataSet[], double[][]) before. After an optimization this method can be used to get the current parameter values.

Specified by:
getCurrentParameterValues in interface DifferentiableSequenceScore
Returns:
the current parameter values

initializeFunction

public void initializeFunction(int index,
                               boolean freeParams,
                               DataSet[] data,
                               double[][] weights)
Description copied from interface: DifferentiableSequenceScore
This method creates the underlying structure of the DifferentiableSequenceScore.

Specified by:
initializeFunction in interface DifferentiableSequenceScore
Parameters:
index - the index of the class the DifferentiableSequenceScore models
freeParams - indicates whether the (reduced) parameterization is used
data - the data sets
weights - the weights of the sequences in the data sets

initializeFunctionRandomly

public void initializeFunctionRandomly(boolean freeParams)
Description copied from interface: DifferentiableSequenceScore
This method initializes the DifferentiableSequenceScore randomly. It has to create the underlying structure of the DifferentiableSequenceScore.

Specified by:
initializeFunctionRandomly in interface DifferentiableSequenceScore
Parameters:
freeParams - indicates whether the (reduced) parameterization is used

fromXML

protected void fromXML(StringBuffer xml)
                throws NonParsableException
Description copied from class: AbstractDifferentiableSequenceScore
This method is called in the constructor for the Storable interface to create a scoring function from a StringBuffer.

Specified by:
fromXML in class AbstractDifferentiableSequenceScore
Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the StringBuffer could not be parsed
See Also:
AbstractDifferentiableSequenceScore.AbstractDifferentiableSequenceScore(StringBuffer)

getSizeOfEventSpaceForRandomVariablesOfParameter

public int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Description copied from interface: DifferentiableStatisticalModel
Returns the size of the event space of the random variables that are affected by parameter no. index, i.e. the product of the sizes of the alphabets at the position of each random variable affected by parameter index. For DNA alphabets this corresponds to 4 for a PWM, 16 for a WAM except position 0, ...

Specified by:
getSizeOfEventSpaceForRandomVariablesOfParameter in interface DifferentiableStatisticalModel
Parameters:
index - the index of the parameter
Returns:
the size of the event space

getLogNormalizationConstant

public double getLogNormalizationConstant(int length)
Description copied from interface: VariableLengthDiffSM
This method returns the logarithm of the normalization constant for a given sequence length.

Specified by:
getLogNormalizationConstant in interface VariableLengthDiffSM
Parameters:
length - the sequence length
Returns:
the logarithm of the normalization constant
See Also:
DifferentiableStatisticalModel.getLogNormalizationConstant()

getLogPartialNormalizationConstant

public double getLogPartialNormalizationConstant(int parameterIndex,
                                                 int length)
                                          throws Exception
Description copied from interface: VariableLengthDiffSM
This method returns the logarithm of the partial normalization constant for a given parameter index and a sequence length.

Specified by:
getLogPartialNormalizationConstant in interface VariableLengthDiffSM
Parameters:
parameterIndex - the index of the parameter
length - the sequence length
Returns:
the logarithm of the partial normalization constant
Throws:
Exception - if something went wrong
See Also:
DifferentiableStatisticalModel.getLogPartialNormalizationConstant(int)

getESS

public double getESS()
Description copied from interface: DifferentiableStatisticalModel
Returns the equivalent sample size (ess) of this model, i.e. the equivalent sample size for the class or component that is represented by this model.

Specified by:
getESS in interface DifferentiableStatisticalModel
Returns:
the equivalent sample size.

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Specified by:
toString in interface SequenceScore
Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance

getLogPriorTerm

public double getLogPriorTerm()
Description copied from interface: DifferentiableStatisticalModel
This method computes a value that is proportional to

DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior )

where prior is the prior for the parameters of this model.

Specified by:
getLogPriorTerm in interface DifferentiableStatisticalModel
Specified by:
getLogPriorTerm in interface StatisticalModel
Returns:
a value that is proportional to DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior ).
See Also:
DifferentiableStatisticalModel.getESS(), DifferentiableStatisticalModel.getLogNormalizationConstant()

addGradientOfLogPriorTerm

public void addGradientOfLogPriorTerm(double[] grad,
                                      int start)
Description copied from interface: DifferentiableStatisticalModel
This method computes the gradient of DifferentiableStatisticalModel.getLogPriorTerm() for each parameter of this model. The results are added to the array grad beginning at index start.

Specified by:
addGradientOfLogPriorTerm in interface DifferentiableStatisticalModel
Parameters:
grad - the array of gradients
start - the start index in the grad array, where the partial derivations for the parameters of this models shall be entered
See Also:
DifferentiableStatisticalModel.getLogPriorTerm()

isNormalized

public boolean isNormalized()
Description copied from interface: DifferentiableStatisticalModel
This method indicates whether the implemented score is already normalized to 1 or not. The standard implementation returns false.

Specified by:
isNormalized in interface DifferentiableStatisticalModel
Overrides:
isNormalized in class AbstractDifferentiableStatisticalModel
Returns:
true if the implemented score is already normalized to 1, false otherwise

isInitialized

public boolean isInitialized()
Description copied from interface: SequenceScore
This method can be used to determine whether the instance is initialized. If the instance is initialized you should be able to invoke SequenceScore.getLogScoreFor(Sequence).

Specified by:
isInitialized in interface SequenceScore
Returns:
true if the instance is initialized, false otherwise

getNumberOfRecommendedStarts

public int getNumberOfRecommendedStarts()
Description copied from interface: DifferentiableSequenceScore
This method returns the number of recommended optimization starts. The standard implementation returns 1.

Specified by:
getNumberOfRecommendedStarts in interface DifferentiableSequenceScore
Overrides:
getNumberOfRecommendedStarts in class AbstractDifferentiableSequenceScore
Returns:
the number of recommended optimization starts

setParameterOptimization

public void setParameterOptimization(boolean optimize)
This method enables the user to choose whether the parameters should be optimized or not.

Parameters:
optimize - the switch for optimization of the parameters

setFrameParameterOptimization

public void setFrameParameterOptimization(boolean optimize)
This method enables the user to choose whether the frame parameters should be optimized or not.

Parameters:
optimize - the switch for optimization of the frame parameters

setStatisticForHyperparameters

public void setStatisticForHyperparameters(int[] length,
                                           double[] weight)
                                    throws Exception
Description copied from interface: VariableLengthDiffSM
This method sets the hyperparameters for the model parameters by evaluating the given statistic. The statistic can be interpreted as follows: The model has seen a number of sequences. From these sequences it is only known how long (length) and how often ( weight) they have been seen.

Specified by:
setStatisticForHyperparameters in interface VariableLengthDiffSM
Parameters:
length - the non-negative lengths of the sequences
weight - the non-negative weight for the corresponding sequence
Throws:
Exception - if something went wrong
See Also:
Mutable

getSamplingGroups

public int[][] getSamplingGroups(int parameterOffset)
Description copied from interface: SamplingDifferentiableStatisticalModel
Returns groups of indexes of parameters that shall be drawn together in a sampling procedure

Specified by:
getSamplingGroups in interface SamplingDifferentiableStatisticalModel
Parameters:
parameterOffset - a global offset on the parameter indexes
Returns:
the groups of indexes. The first dimension represents the different groups while the second dimension contains the parameters that shall be sampled together. Internal parameter indexes need to be increased by parameterOffset.