de.jstacs.sequenceScores.statisticalModels.differentiable
Class UniformDiffSM

java.lang.Object
  extended by de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
      extended by de.jstacs.sequenceScores.differentiable.UniformDiffSS
          extended by de.jstacs.sequenceScores.statisticalModels.differentiable.UniformDiffSM
All Implemented Interfaces:
DifferentiableSequenceScore, SequenceScore, DifferentiableStatisticalModel, SamplingDifferentiableStatisticalModel, StatisticalModel, Storable, Cloneable

public class UniformDiffSM
extends UniformDiffSS
implements SamplingDifferentiableStatisticalModel

This DifferentiableStatisticalModel does nothing. So it is possible to save parameters in an optimization.

Author:
Jens Keilwagen, Jan Grau

Field Summary
 
Fields inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
alphabets, length, r
 
Fields inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
UNKNOWN
 
Constructor Summary
UniformDiffSM(AlphabetContainer alphabets, int length, double ess)
          This is the main constructor that creates an instance of a UniformDiffSM that models each sequence uniformly.
UniformDiffSM(StringBuffer xml)
          This is the constructor for the interface Storable.
 
Method Summary
 void addGradientOfLogPriorTerm(double[] grad, int start)
          This method computes the gradient of DifferentiableStatisticalModel.getLogPriorTerm() for each parameter of this model.
 DataSet emitDataSet(int numberOfSequences, int... seqLength)
          This method returns a DataSet object containing artificial sequence(s).
protected  void extractFurtherInformation(StringBuffer xml)
          This method is the opposite of UniformDiffSS.getFurtherInformation().
 double getESS()
          Returns the equivalent sample size (ess) of this model, i.e. the equivalent sample size for the class or component that is represented by this model.
protected  StringBuffer getFurtherInformation()
          This method is used to append further information of the instance to the XML representation.
 double getLogNormalizationConstant()
          Returns the logarithm of the sum of the scores over all sequences of the event space.
 double getLogPartialNormalizationConstant(int parameterIndex)
          Returns the logarithm of the partial normalization constant for the parameter with index parameterIndex.
 double getLogPriorTerm()
          This method computes a value that is proportional to DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior ) where prior is the prior for the parameters of this model.
 double getLogProbFor(Sequence sequence)
          Returns the logarithm of the probability of the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 double getLogScoreAndPartialDerivation(Sequence seq, int start, IntList indices, DoubleList dList)
          Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.
 double getLogScoreFor(Sequence seq, int start)
          Returns the logarithmic score for the Sequence seq beginning at position start in the Sequence.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 int[][] getSamplingGroups(int parameterOffset)
          Returns groups of indexes of parameters that shall be drawn together in a sampling procedure
 int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
          Returns the size of the event space of the random variables that are affected by parameter no.
 void initializeFunction(int index, boolean meila, DataSet[] data, double[][] weights)
          This method creates the underlying structure of the DifferentiableSequenceScore.
 boolean isNormalized()
          This method indicates whether the implemented score is already normalized to 1 or not.
 String toString()
           
 
Methods inherited from class de.jstacs.sequenceScores.differentiable.UniformDiffSS
fromXML, getCurrentParameterValues, getInstanceName, getNumberOfParameters, initializeFunctionRandomly, isInitialized, setParameters, toXML
 
Methods inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
clone, getAlphabetContainer, getCharacteristics, getInitialClassParam, getLength, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getNumberOfRecommendedStarts, getNumberOfStarts, getNumericalCharacteristics
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
clone, getCurrentParameterValues, getInitialClassParam, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation, getNumberOfParameters, getNumberOfRecommendedStarts, initializeFunctionRandomly, setParameters
 
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore
getAlphabetContainer, getCharacteristics, getInstanceName, getLength, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getNumericalCharacteristics, isInitialized
 
Methods inherited from interface de.jstacs.Storable
toXML
 

Constructor Detail

UniformDiffSM

public UniformDiffSM(AlphabetContainer alphabets,
                     int length,
                     double ess)
This is the main constructor that creates an instance of a UniformDiffSM that models each sequence uniformly.

Parameters:
alphabets - the AlphabetContainer
length - the length of the modeled sequences
ess - the equivalent sample size (ess) of the class

UniformDiffSM

public UniformDiffSM(StringBuffer xml)
              throws NonParsableException
This is the constructor for the interface Storable. Creates a new UniformDiffSM out of its XML representation as returned by UniformDiffSS.fromXML(StringBuffer).

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the XML representation could not be parsed
Method Detail

getLogScoreFor

public double getLogScoreFor(Sequence seq,
                             int start)
Description copied from interface: SequenceScore
Returns the logarithmic score for the Sequence seq beginning at position start in the Sequence.

Specified by:
getLogScoreFor in interface SequenceScore
Overrides:
getLogScoreFor in class UniformDiffSS
Parameters:
seq - the Sequence
start - the start position in the Sequence
Returns:
the logarithmic score for the Sequence

getLogScoreAndPartialDerivation

public double getLogScoreAndPartialDerivation(Sequence seq,
                                              int start,
                                              IntList indices,
                                              DoubleList dList)
Description copied from interface: DifferentiableSequenceScore
Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.

Specified by:
getLogScoreAndPartialDerivation in interface DifferentiableSequenceScore
Overrides:
getLogScoreAndPartialDerivation in class UniformDiffSS
Parameters:
seq - the Sequence
start - the start position in the Sequence
indices - an IntList of indices, after method invocation the list should contain the indices i where $\frac{\partial \log score(seq)}{\partial \lambda_i}$ is not zero
dList - a DoubleList of partial derivations, after method invocation the list should contain the corresponding $\frac{\partial \log score(seq)}{\partial \lambda_i}$ that are not zero
Returns:
the logarithmic score for the Sequence

getFurtherInformation

protected StringBuffer getFurtherInformation()
Description copied from class: UniformDiffSS
This method is used to append further information of the instance to the XML representation. This method is designed to allow subclasses to add information to the XML representation.

Overrides:
getFurtherInformation in class UniformDiffSS
Returns:
the further information as XML code in a StringBuffer
See Also:
UniformDiffSS.extractFurtherInformation(StringBuffer)

extractFurtherInformation

protected void extractFurtherInformation(StringBuffer xml)
                                  throws NonParsableException
Description copied from class: UniformDiffSS
This method is the opposite of UniformDiffSS.getFurtherInformation(). It extracts further information of the instance from a XML representation.

Overrides:
extractFurtherInformation in class UniformDiffSS
Parameters:
xml - the StringBuffer containing the information to be extracted as XML code
Throws:
NonParsableException - if the StringBuffer could not be parsed
See Also:
UniformDiffSS.getFurtherInformation()

getLogNormalizationConstant

public double getLogNormalizationConstant()
Description copied from interface: DifferentiableStatisticalModel
Returns the logarithm of the sum of the scores over all sequences of the event space.

Specified by:
getLogNormalizationConstant in interface DifferentiableStatisticalModel
Returns:
the logarithm of the normalization constant Z

initializeFunction

public void initializeFunction(int index,
                               boolean meila,
                               DataSet[] data,
                               double[][] weights)
Description copied from interface: DifferentiableSequenceScore
This method creates the underlying structure of the DifferentiableSequenceScore.

Specified by:
initializeFunction in interface DifferentiableSequenceScore
Overrides:
initializeFunction in class UniformDiffSS
Parameters:
index - the index of the class the DifferentiableSequenceScore models
meila - indicates whether the (reduced) parameterization is used
data - the samples
weights - the weights of the sequences in the samples

getSizeOfEventSpaceForRandomVariablesOfParameter

public int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Description copied from interface: DifferentiableStatisticalModel
Returns the size of the event space of the random variables that are affected by parameter no. index, i.e. the product of the sizes of the alphabets at the position of each random variable affected by parameter index. For DNA alphabets this corresponds to 4 for a PWM, 16 for a WAM except position 0, ...

Specified by:
getSizeOfEventSpaceForRandomVariablesOfParameter in interface DifferentiableStatisticalModel
Parameters:
index - the index of the parameter
Returns:
the size of the event space

getLogPartialNormalizationConstant

public double getLogPartialNormalizationConstant(int parameterIndex)
                                          throws Exception
Description copied from interface: DifferentiableStatisticalModel
Returns the logarithm of the partial normalization constant for the parameter with index parameterIndex. This is the logarithm of the partial derivation of the normalization constant for the parameter with index parameterIndex,
\[\log \frac{\partial Z(\underline{\lambda})}{\partial \lambda_{parameterindex}}\]
.

Specified by:
getLogPartialNormalizationConstant in interface DifferentiableStatisticalModel
Parameters:
parameterIndex - the index of the parameter
Returns:
the logarithm of the partial normalization constant
Throws:
Exception - if something went wrong with the normalization
See Also:
DifferentiableStatisticalModel.getLogNormalizationConstant()

getESS

public double getESS()
Description copied from interface: DifferentiableStatisticalModel
Returns the equivalent sample size (ess) of this model, i.e. the equivalent sample size for the class or component that is represented by this model.

Specified by:
getESS in interface DifferentiableStatisticalModel
Returns:
the equivalent sample size.

toString

public String toString()
Overrides:
toString in class UniformDiffSS

getLogPriorTerm

public double getLogPriorTerm()
Description copied from interface: DifferentiableStatisticalModel
This method computes a value that is proportional to

DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior )

where prior is the prior for the parameters of this model.

Specified by:
getLogPriorTerm in interface DifferentiableStatisticalModel
Specified by:
getLogPriorTerm in interface StatisticalModel
Returns:
a value that is proportional to DifferentiableStatisticalModel.getESS() * DifferentiableStatisticalModel.getLogNormalizationConstant() + Math.log( prior ).
See Also:
DifferentiableStatisticalModel.getESS(), DifferentiableStatisticalModel.getLogNormalizationConstant()

addGradientOfLogPriorTerm

public void addGradientOfLogPriorTerm(double[] grad,
                                      int start)
Description copied from interface: DifferentiableStatisticalModel
This method computes the gradient of DifferentiableStatisticalModel.getLogPriorTerm() for each parameter of this model. The results are added to the array grad beginning at index start.

Specified by:
addGradientOfLogPriorTerm in interface DifferentiableStatisticalModel
Parameters:
grad - the array of gradients
start - the start index in the grad array, where the partial derivations for the parameters of this models shall be entered
See Also:
DifferentiableStatisticalModel.getLogPriorTerm()

isNormalized

public boolean isNormalized()
Description copied from interface: DifferentiableStatisticalModel
This method indicates whether the implemented score is already normalized to 1 or not. The standard implementation returns false.

Specified by:
isNormalized in interface DifferentiableStatisticalModel
Returns:
true if the implemented score is already normalized to 1, false otherwise

getSamplingGroups

public int[][] getSamplingGroups(int parameterOffset)
Description copied from interface: SamplingDifferentiableStatisticalModel
Returns groups of indexes of parameters that shall be drawn together in a sampling procedure

Specified by:
getSamplingGroups in interface SamplingDifferentiableStatisticalModel
Parameters:
parameterOffset - a global offset on the parameter indexes
Returns:
the groups of indexes. The first dimension represents the different groups while the second dimension contains the parameters that shall be sampled together. Internal parameter indexes need to be increased by parameterOffset.

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos)
                     throws Exception
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

If the length of the sequences, whose probability should be returned, is fixed (e.g. in a inhomogeneous model) and the given sequence is longer than their fixed length, the start position within the given sequence is given by startpos. E.g. the fixed length is 12. The length of the given sequence is 30 and the startpos=15 the logarithm of the probability of the part from position 15 to 26 (inclusive) given the model should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Specified by:
getLogProbFor in interface StatisticalModel
Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet
See Also:
StatisticalModel.getLogProbFor(Sequence, int, int)

getLogProbFor

public double getLogProbFor(Sequence sequence)
                     throws Exception
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Specified by:
getLogProbFor in interface StatisticalModel
Parameters:
sequence - the given sequence for which the logarithm of the probability/the value of the density function should be returned
Returns:
the logarithm of the probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet
See Also:
StatisticalModel.getLogProbFor(Sequence, int, int)

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos,
                            int endpos)
                     throws Exception
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method StatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally, the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Specified by:
getLogProbFor in interface StatisticalModel
Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model
NotTrainedException - if the model is not trained yet

emitDataSet

public DataSet emitDataSet(int numberOfSequences,
                           int... seqLength)
                    throws NotTrainedException,
                           Exception
Description copied from interface: StatisticalModel
This method returns a DataSet object containing artificial sequence(s).

There are two different possibilities to create a sample for a model with length 0 (homogeneous models).
  1. emitDataSet( int n, int l ) should return a data set with n sequences of length l.
  2. emitDataSet( int n, int[] l ) should return a data set with n sequences which have a sequence length corresponding to the entry in the given array l.

There are two different possibilities to create a sample for a model with length greater than 0 (inhomogeneous models).
emitDataSet( int n ) and emitDataSet( int n, null ) should return a sample with n sequences of length of the model ( SequenceScore.getLength()).

The standard implementation throws an Exception.

Specified by:
emitDataSet in interface StatisticalModel
Parameters:
numberOfSequences - the number of sequences that should be contained in the returned sample
seqLength - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
a DataSet containing the artificial sequence(s)
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the emission did not succeed
See Also:
DataSet

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
Description copied from interface: StatisticalModel
This method returns the maximal used Markov order, if possible.

Specified by:
getMaximalMarkovOrder in interface StatisticalModel
Returns:
maximal used Markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer