de.jstacs.scoringFunctions.homogeneous
Class HMMScoringFunction

java.lang.Object
  extended by de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
      extended by de.jstacs.scoringFunctions.AbstractVariableLengthScoringFunction
          extended by de.jstacs.scoringFunctions.homogeneous.HomogeneousScoringFunction
              extended by de.jstacs.scoringFunctions.homogeneous.HMMScoringFunction
All Implemented Interfaces:
NormalizableScoringFunction, ScoringFunction, VariableLengthScoringFunction, Storable, Cloneable

public class HMMScoringFunction
extends HomogeneousScoringFunction

This scoring function implements a homogeneous Markov model of arbitrary order for any sequence length. The scoring function uses the parameterization of Meila if one uses the free parameters, which yields in a non-concave log conditional likelihood.

Author:
Jens Keilwagen

Field Summary
 
Fields inherited from class de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
alphabets, length, r
 
Fields inherited from interface de.jstacs.scoringFunctions.ScoringFunction
UNKNOWN
 
Constructor Summary
HMMScoringFunction(AlphabetContainer alphabets, int order, double classEss, double[] sumOfHyperParams, boolean plugIn, boolean optimize, int starts)
          This is the main constructor that creates an instance of a homogeneous Markov model of arbitrary order.
HMMScoringFunction(AlphabetContainer alphabets, int order, double classEss, int length)
          This is a convenience constructor for creating an instance of a homogeneous Markov model of arbitrary order.
HMMScoringFunction(StringBuffer xml)
          This is the constructor for Storable.
 
Method Summary
 void addGradientOfLogPriorTerm(double[] grad, int start)
          This method computes the gradient of NormalizableScoringFunction.getLogPriorTerm() for each parameter of this model.
 HMMScoringFunction clone()
          Creates a clone (deep copy) of the current ScoringFunction instance.
 Sample emit(int numberOfSequences, int... seqLength)
          This method returns a Sample object containing artificial sequence(s).
protected  void fromXML(StringBuffer xml)
          This method is called in the constructor for the Storable interface to create a scoring function from a StringBuffer.
 double[][][] getAllConditionalStationaryDistributions()
          This method returns the stationary conditional distributions.
 double[] getCurrentParameterValues()
          Returns a double array of dimension ScoringFunction.getNumberOfParameters() containing the current parameter values.
 double getEss()
          Returns the equivalent sample size (ess) of this model, i.e. the equivalent sample size for the class or component that is represented by this model.
 String getInstanceName()
          Returns a short instance name.
 double getLogNormalizationConstant(int length)
          This method returns the logarithm of the normalization constant for a given sequence length.
 double getLogPartialNormalizationConstant(int parameterIndex, int length)
          This method returns the logarithm of the partial normalization constant for a given parameter index and a sequence length.
 double getLogPriorTerm()
          This method computes a value that is proportional to NormalizableScoringFunction.getEss() * NormalizableScoringFunction.getLogNormalizationConstant() + Math.log( prior ) where prior is the prior for the parameters of this model.
 double getLogScore(Sequence seq, int start, int length)
          This method computes the logarithm of the score for a given subsequence.
 double getLogScoreAndPartialDerivation(Sequence seq, int start, int length, IntList indices, DoubleList dList)
          This method computes the logarithm of the score and the partial derivations for a given subsequence.
 int getMaximalMarkovOrder()
          Returns the maximal used markov oder.
 int getNumberOfParameters()
          Returns the number of parameters in this ScoringFunction.
 int getNumberOfRecommendedStarts()
          This method returns the number of recommended optimization starts.
 int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
          Returns the size of the event space of the random variables that are affected by parameter no.
static double[] getSumOfHyperParameters(int order, int length, double ess)
          This method returns an array that can be used in the constructor HMMScoringFunction(AlphabetContainer, int, double, double[], boolean, boolean, int) containing the sums of the specific hyperparameters.
 void initializeFunction(int index, boolean freeParams, Sample[] data, double[][] weights)
          This method creates the underlying structure of the ScoringFunction.
 void initializeFunctionRandomly(boolean freeParams)
          This method initializes the ScoringFunction randomly.
 void initializeUniformly(boolean freeParams)
          This method allows to initialize the instance with an uniform distribution.
 boolean isInitialized()
          This method can be used to determine whether the model is initialized.
 boolean isNormalized()
          This method indicates whether the implemented score is already normalized to 1 or not.
 void setParameterOptimization(boolean optimize)
          This method allows the user to specify whether the parameters should be optimized or not.
 void setParameters(double[] params, int start)
          This method sets the internal parameters to the values of params between start and start + ScoringFunction.getNumberOfParameters() - 1
 void setStatisticForHyperparameters(int[] length, double[] weight)
          This method sets the hyperparameters for the model parameters by evaluating the given statistic.
 String toString()
           
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 
Methods inherited from class de.jstacs.scoringFunctions.AbstractVariableLengthScoringFunction
getLogNormalizationConstant, getLogPartialNormalizationConstant, getLogScore, getLogScoreAndPartialDerivation
 
Methods inherited from class de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
getAlphabetContainer, getInitialClassParam, getLength, getLogScore, getLogScoreAndPartialDerivation, getNumberOfStarts, isNormalized
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.jstacs.scoringFunctions.NormalizableScoringFunction
getInitialClassParam
 
Methods inherited from interface de.jstacs.scoringFunctions.ScoringFunction
getAlphabetContainer, getLength, getLogScore, getLogScoreAndPartialDerivation
 

Constructor Detail

HMMScoringFunction

public HMMScoringFunction(AlphabetContainer alphabets,
                          int order,
                          double classEss,
                          int length)
This is a convenience constructor for creating an instance of a homogeneous Markov model of arbitrary order.

Parameters:
alphabets - the AlphabetContainer
order - the oder of the model (has to be non-negative)
classEss - the equivalent sample size (ess) of the class
length - the sequence length (only used for computing the hyperparameters)
See Also:
getSumOfHyperParameters(int, int, double), HMMScoringFunction(AlphabetContainer, int, double, double[], boolean, boolean, int)

HMMScoringFunction

public HMMScoringFunction(AlphabetContainer alphabets,
                          int order,
                          double classEss,
                          double[] sumOfHyperParams,
                          boolean plugIn,
                          boolean optimize,
                          int starts)
This is the main constructor that creates an instance of a homogeneous Markov model of arbitrary order.

Parameters:
alphabets - the AlphabetContainer
order - the oder of the model (has to be non-negative)
classEss - the equivalent sample size (ess) of the class
sumOfHyperParams - the sum of the hyperparameters for each order (length has to be order, each entry has to be non-negative)
plugIn - a switch which enables to use the MAP-parameters as plug-in parameters
optimize - a switch which enables to optimize or fix the parameters
starts - the number of recommended starts

HMMScoringFunction

public HMMScoringFunction(StringBuffer xml)
                   throws NonParsableException
This is the constructor for Storable. Creates a new HMMScoringFunction out of its XML representation as returned by fromXML(StringBuffer).

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the StringBuffer representation could not be parsed
Method Detail

getSumOfHyperParameters

public static double[] getSumOfHyperParameters(int order,
                                               int length,
                                               double ess)
This method returns an array that can be used in the constructor HMMScoringFunction(AlphabetContainer, int, double, double[], boolean, boolean, int) containing the sums of the specific hyperparameters.

Parameters:
order - the order of the model
length - the sequence length
ess - the class ESS
Returns:
an array containing the sums of the specific hyperparameters
See Also:
HMMScoringFunction(AlphabetContainer, int, double, double[], boolean, boolean, int)

clone

public HMMScoringFunction clone()
                         throws CloneNotSupportedException
Description copied from interface: ScoringFunction
Creates a clone (deep copy) of the current ScoringFunction instance.

Specified by:
clone in interface ScoringFunction
Overrides:
clone in class AbstractNormalizableScoringFunction
Returns:
the cloned instance of the current ScoringFunction
Throws:
CloneNotSupportedException - if something went wrong while cloning the ScoringFunction

getInstanceName

public String getInstanceName()
Description copied from interface: ScoringFunction
Returns a short instance name.

Returns:
a short instance name

getLogScore

public double getLogScore(Sequence seq,
                          int start,
                          int length)
Description copied from interface: VariableLengthScoringFunction
This method computes the logarithm of the score for a given subsequence.

Parameters:
seq - the Sequence
start - the start index in the Sequence
length - the length of the Sequence beginning at start
Returns:
the logarithm of the score for the subsequence
See Also:
ScoringFunction.getLogScore(Sequence, int)

getLogScoreAndPartialDerivation

public double getLogScoreAndPartialDerivation(Sequence seq,
                                              int start,
                                              int length,
                                              IntList indices,
                                              DoubleList dList)
Description copied from interface: VariableLengthScoringFunction
This method computes the logarithm of the score and the partial derivations for a given subsequence.

Parameters:
seq - the Sequence
start - the start index in the Sequence
length - the end index in the Sequence
indices - an IntList of indices, after method invocation the list should contain the indices i where $\frac{\partial \log score(seq)}{\partial \lambda_i}$ is not zero
dList - a DoubleList of partial derivations, after method invocation the list should contain the corresponding $\frac{\partial \log score(seq)}{\partial \lambda_i}$ that are not zero
Returns:
the logarithm of the score
See Also:
ScoringFunction.getLogScoreAndPartialDerivation(Sequence, int, IntList, DoubleList)

getNumberOfParameters

public int getNumberOfParameters()
Description copied from interface: ScoringFunction
Returns the number of parameters in this ScoringFunction. If the number of parameters is not known yet, the method returns ScoringFunction.UNKNOWN.

Returns:
the number of parameters in this ScoringFunction
See Also:
ScoringFunction.UNKNOWN

setParameters

public void setParameters(double[] params,
                          int start)
Description copied from interface: ScoringFunction
This method sets the internal parameters to the values of params between start and start + ScoringFunction.getNumberOfParameters() - 1

Parameters:
params - the new parameters
start - the start index in params

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Returns:
the XML representation

getCurrentParameterValues

public double[] getCurrentParameterValues()
Description copied from interface: ScoringFunction
Returns a double array of dimension ScoringFunction.getNumberOfParameters() containing the current parameter values. If one likes to use these parameters to start an optimization it is highly recommended to invoke ScoringFunction.initializeFunction(int, boolean, Sample[], double[][]) before. After an optimization this method can be used to get the current parameter values.

Returns:
the current parameter values

initializeFunction

public void initializeFunction(int index,
                               boolean freeParams,
                               Sample[] data,
                               double[][] weights)
Description copied from interface: ScoringFunction
This method creates the underlying structure of the ScoringFunction.

Parameters:
index - the index of the class the ScoringFunction models
freeParams - indicates whether the (reduced) parameterization is used
data - the samples
weights - the weights of the sequences in the samples

initializeFunctionRandomly

public void initializeFunctionRandomly(boolean freeParams)
Description copied from interface: ScoringFunction
This method initializes the ScoringFunction randomly. It has to create the underlying structure of the ScoringFunction.

Parameters:
freeParams - indicates whether the (reduced) parameterization is used

fromXML

protected void fromXML(StringBuffer xml)
                throws NonParsableException
Description copied from class: AbstractNormalizableScoringFunction
This method is called in the constructor for the Storable interface to create a scoring function from a StringBuffer.

Specified by:
fromXML in class AbstractNormalizableScoringFunction
Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the StringBuffer could not be parsed
See Also:
AbstractNormalizableScoringFunction.AbstractNormalizableScoringFunction(StringBuffer)

getSizeOfEventSpaceForRandomVariablesOfParameter

public int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Description copied from interface: NormalizableScoringFunction
Returns the size of the event space of the random variables that are affected by parameter no. index, i.e. the product of the sizes of the alphabets at the position of each random variable affected by parameter index. For DNA alphabets this corresponds to 4 for a PWM, 16 for a WAM except position 0, ...

Parameters:
index - the index of the parameter
Returns:
the size of the event space

getLogNormalizationConstant

public double getLogNormalizationConstant(int length)
Description copied from interface: VariableLengthScoringFunction
This method returns the logarithm of the normalization constant for a given sequence length.

Parameters:
length - the sequence length
Returns:
the logarithm of the normalization constant
See Also:
NormalizableScoringFunction.getLogNormalizationConstant()

getLogPartialNormalizationConstant

public double getLogPartialNormalizationConstant(int parameterIndex,
                                                 int length)
                                          throws Exception
Description copied from interface: VariableLengthScoringFunction
This method returns the logarithm of the partial normalization constant for a given parameter index and a sequence length.

Parameters:
parameterIndex - the index of the parameter
length - the sequence length
Returns:
the logarithm of the partial normalization constant
Throws:
Exception - if something went wrong
See Also:
NormalizableScoringFunction.getLogPartialNormalizationConstant(int)

getEss

public double getEss()
Description copied from interface: NormalizableScoringFunction
Returns the equivalent sample size (ess) of this model, i.e. the equivalent sample size for the class or component that is represented by this model.

Returns:
the equivalent sample size.

toString

public String toString()
Overrides:
toString in class Object

getLogPriorTerm

public double getLogPriorTerm()
Description copied from interface: NormalizableScoringFunction
This method computes a value that is proportional to

NormalizableScoringFunction.getEss() * NormalizableScoringFunction.getLogNormalizationConstant() + Math.log( prior )

where prior is the prior for the parameters of this model.

Returns:
a value that is proportional to NormalizableScoringFunction.getEss() * NormalizableScoringFunction.getLogNormalizationConstant() + Math.log( prior ).
See Also:
NormalizableScoringFunction.getEss(), NormalizableScoringFunction.getLogNormalizationConstant()

addGradientOfLogPriorTerm

public void addGradientOfLogPriorTerm(double[] grad,
                                      int start)
Description copied from interface: NormalizableScoringFunction
This method computes the gradient of NormalizableScoringFunction.getLogPriorTerm() for each parameter of this model. The results are added to the array grad beginning at index start.

Parameters:
grad - the array of gradients
start - the start index in the grad array, where the partial derivations for the parameters of this models shall be entered
See Also:
NormalizableScoringFunction.getLogPriorTerm()

isNormalized

public boolean isNormalized()
Description copied from interface: NormalizableScoringFunction
This method indicates whether the implemented score is already normalized to 1 or not. The standard implementation returns false.

Specified by:
isNormalized in interface NormalizableScoringFunction
Overrides:
isNormalized in class AbstractNormalizableScoringFunction
Returns:
true if the implemented score is already normalized to 1, false otherwise

isInitialized

public boolean isInitialized()
Description copied from interface: ScoringFunction
This method can be used to determine whether the model is initialized. If the model is not initialized you should invoke the method ScoringFunction.initializeFunction(int, boolean, Sample[], double[][]).

Returns:
true if the model is initialized, false otherwise

getMaximalMarkovOrder

public int getMaximalMarkovOrder()
Description copied from class: HomogeneousScoringFunction
Returns the maximal used markov oder.

Specified by:
getMaximalMarkovOrder in class HomogeneousScoringFunction
Returns:
the maximal used markov oder

getNumberOfRecommendedStarts

public int getNumberOfRecommendedStarts()
Description copied from interface: ScoringFunction
This method returns the number of recommended optimization starts. The standard implementation returns 1.

Specified by:
getNumberOfRecommendedStarts in interface ScoringFunction
Overrides:
getNumberOfRecommendedStarts in class AbstractNormalizableScoringFunction
Returns:
the number of recommended optimization starts

setParameterOptimization

public void setParameterOptimization(boolean optimize)
This method allows the user to specify whether the parameters should be optimized or not.

Parameters:
optimize - indicates if the parameters should be optimized or not

getAllConditionalStationaryDistributions

public double[][][] getAllConditionalStationaryDistributions()
This method returns the stationary conditional distributions. The first dimension of the result is used for the order, the second is used for encoding the context, and the third is used for the different values of the random variable.

For an homogeneous Markov model of order 2 it returns an array containing the stationary symbol distribution as first entry, the conditional stationary distribution of order 1 as second entry and the conditional distribution of order 2 as third entry.

Returns:
all conditional stationary distributions

setStatisticForHyperparameters

public void setStatisticForHyperparameters(int[] length,
                                           double[] weight)
                                    throws Exception
Description copied from interface: VariableLengthScoringFunction
This method sets the hyperparameters for the model parameters by evaluating the given statistic. The statistic can be interpreted as follows: The model has seen a number of sequences. From these sequences it is only known how long (length) and how often ( weight) they have been seen.

Parameters:
length - the non-negative lengths of the sequences
weight - the non-negative weight for the corresponding sequence
Throws:
Exception - if something went wrong
See Also:
Mutable

emit

public Sample emit(int numberOfSequences,
                   int... seqLength)
            throws Exception
This method returns a Sample object containing artificial sequence(s).

There are 2 different possibilities to create a Sample:
  1. emitSample( int n, int l ) returns a Sample with n sequences of length l.
  2. emitSample( int n, int[] l ) should return a Sample with n sequences which have a sequence length corresponding to the entry in the array.

Parameters:
numberOfSequences - the number of sequences that should be contained in the returned Sample
seqLength - the length of the sequences
Returns:
a Sample containing numberOfSequences artificial sequence(s)
Throws:
Exception - if the emission of the artificial Sample did not succeed
See Also:
Sample

initializeUniformly

public void initializeUniformly(boolean freeParams)
Description copied from class: HomogeneousScoringFunction
This method allows to initialize the instance with an uniform distribution.

Specified by:
initializeUniformly in class HomogeneousScoringFunction
Parameters:
freeParams - a switch whether to take only free parameters or to take all