de.jstacs.sequenceScores.statisticalModels.differentiable.mixture
Class MixtureDiffSM

java.lang.Object
  extended by de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
      extended by de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractDifferentiableStatisticalModel
          extended by de.jstacs.sequenceScores.statisticalModels.differentiable.mixture.AbstractMixtureDiffSM
              extended by de.jstacs.sequenceScores.statisticalModels.differentiable.mixture.MixtureDiffSM
All Implemented Interfaces:
MotifDiscoverer, MutableMotifDiscoverer, DifferentiableSequenceScore, SequenceScore, DifferentiableStatisticalModel, SamplingDifferentiableStatisticalModel, StatisticalModel, Storable, Cloneable
Direct Known Subclasses:
VariableLengthMixtureDiffSM

public class MixtureDiffSM
extends AbstractMixtureDiffSM
implements MutableMotifDiscoverer

This class implements a real mixture model.

Author:
Jens Keilwagen

Nested Class Summary
 
Nested classes/interfaces inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer
MotifDiscoverer.KindOfProfile
 
Field Summary
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.differentiable.mixture.AbstractMixtureDiffSM
componentScore, dList, freeParams, function, hiddenParameter, hiddenPotential, iList, logGammaSum, logHiddenNorm, logHiddenPotential, norm, optimizeHidden, paramRef, partNorm
 
Fields inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
alphabets, length, r
 
Fields inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
UNKNOWN
 
Constructor Summary
MixtureDiffSM(int starts, boolean plugIn, DifferentiableStatisticalModel... component)
          This constructor creates a new MixtureDiffSM.
MixtureDiffSM(StringBuffer xml)
          This is the constructor for the interface Storable.
 
Method Summary
 void adjustHiddenParameters(int index, DataSet[] data, double[][] weights)
          Adjusts all hidden parameters including duration and mixture parameters according to the current values of the remaining parameters.
 MixtureDiffSM clone()
          Creates a clone (deep copy) of the current DifferentiableSequenceScore instance.
protected  void fillComponentScores(Sequence seq, int start)
          Fills the internal array AbstractMixtureDiffSM.componentScore with the logarithmic scores of the components given a Sequence.
 double getESS()
          Returns the equivalent sample size (ess) of this model, i.e.
 int getGlobalIndexOfMotifInComponent(int component, int motif)
          Returns the global index of the motif used in component.
 double getHyperparameterForHiddenParameter(int index)
          This method returns the hyperparameter for the hidden parameter with index index.
 int getIndexOfMaximalComponentFor(Sequence sequence)
          Returns the index of the component with the maximal score for the sequence sequence.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
protected  double getLogNormalizationConstantForComponent(int i)
          Computes the logarithm of the normalization constant for the component i.
 double getLogPartialNormalizationConstant(int parameterIndex)
          Returns the logarithm of the partial normalization constant for the parameter with index parameterIndex.
 double getLogScoreAndPartialDerivation(Sequence seq, int start, IntList indices, DoubleList partialDer)
          Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.
 int getMotifLength(int motif)
          This method returns the length of the motif with index motif .
 int getNumberOfMotifs()
          Returns the number of motifs for this MotifDiscoverer.
 int getNumberOfMotifsInComponent(int component)
          Returns the number of motifs that are used in the component component of this MotifDiscoverer.
 double[] getProfileOfScoresFor(int component, int motif, Sequence sequence, int startpos, MotifDiscoverer.KindOfProfile kind)
          Returns the profile of the scores for component component and motif motif at all possible start positions of the motif in the sequence sequence beginning at startpos.
 double[] getStrandProbabilitiesFor(int component, int motif, Sequence sequence, int startpos)
          This method returns the probabilities of the strand orientations for a given subsequence if it is considered as site of the motif model in a specific component.
 void initializeMotif(int motifIndex, DataSet data, double[] weights)
          This method allows to initialize the model of a motif manually using a weighted data set.
 void initializeMotifRandomly(int motif)
          This method initializes the motif with index motif randomly using for instance DifferentiableSequenceScore.initializeFunctionRandomly(boolean).
protected  void initializeUsingPlugIn(int index, boolean freeParams, DataSet[] data, double[][] weights)
          This method initializes the functions using the data in some way.
 boolean modifyMotif(int motifIndex, int offsetLeft, int offsetRight)
          Manually modifies the motif model with index motifIndex.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.differentiable.mixture.AbstractMixtureDiffSM
addGradientOfLogPriorTerm, cloneFunctions, computeHiddenParameter, computeLogGammaSum, determineIsNormalized, extractFurtherInformation, fromXML, getCurrentParameterValues, getDifferentiableStatisticalModels, getFunction, getFunctions, getFurtherInformation, getIndexOfMaximalComponentFor, getIndices, getLogNormalizationConstant, getLogPriorTerm, getLogScoreFor, getNumberOfComponents, getNumberOfParameters, getNumberOfRecommendedStarts, getProbsForComponent, getSamplingGroups, getSizeOfEventSpaceForRandomVariablesOfParameter, getXMLTag, init, initializeFunction, initializeFunctionRandomly, initializeHiddenPotentialRandomly, initializeHiddenUniformly, initWithLength, isInitialized, isNormalized, precomputeNorm, setHiddenParameters, setParameters, setParametersForFunction, toXML
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.differentiable.AbstractDifferentiableStatisticalModel
emitDataSet, getInitialClassParam, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, isNormalized
 
Methods inherited from class de.jstacs.sequenceScores.differentiable.AbstractDifferentiableSequenceScore
getAlphabetContainer, getCharacteristics, getLength, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation, getLogScoreFor, getLogScoreFor, getNumberOfStarts, getNumericalCharacteristics
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer
getNumberOfComponents
 
Methods inherited from interface de.jstacs.Storable
toXML
 
Methods inherited from interface de.jstacs.sequenceScores.differentiable.DifferentiableSequenceScore
getInitialClassParam, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation
 
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.StatisticalModel
emitDataSet, getLogProbFor, getLogProbFor, getLogProbFor, getMaximalMarkovOrder
 
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore
getAlphabetContainer, getCharacteristics, getLength, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getNumericalCharacteristics
 

Constructor Detail

MixtureDiffSM

public MixtureDiffSM(int starts,
                     boolean plugIn,
                     DifferentiableStatisticalModel... component)
              throws CloneNotSupportedException
This constructor creates a new MixtureDiffSM. The first component determines the length of the sequences that can be modeled.

Parameters:
starts - the number of starts that should be done in an optimization
plugIn - indicates whether the initial parameters for an optimization should be related to the data or randomly drawn
component - the DifferentiableStatisticalModels
Throws:
CloneNotSupportedException - if an element of component could not be cloned

MixtureDiffSM

public MixtureDiffSM(StringBuffer xml)
              throws NonParsableException
This is the constructor for the interface Storable. Creates a new MixtureDiffSM out of a StringBuffer.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the XML representation could not be parsed
Method Detail

clone

public MixtureDiffSM clone()
                    throws CloneNotSupportedException
Description copied from interface: DifferentiableSequenceScore
Creates a clone (deep copy) of the current DifferentiableSequenceScore instance.

Specified by:
clone in interface MotifDiscoverer
Specified by:
clone in interface DifferentiableSequenceScore
Specified by:
clone in interface SequenceScore
Overrides:
clone in class AbstractMixtureDiffSM
Returns:
the cloned instance of the current DifferentiableSequenceScore
Throws:
CloneNotSupportedException - if something went wrong while cloning the DifferentiableSequenceScore
See Also:
Cloneable

getLogNormalizationConstantForComponent

protected double getLogNormalizationConstantForComponent(int i)
Description copied from class: AbstractMixtureDiffSM
Computes the logarithm of the normalization constant for the component i.

Specified by:
getLogNormalizationConstantForComponent in class AbstractMixtureDiffSM
Parameters:
i - the index of the component
Returns:
the logarithm of the normalization constant of the component

getLogPartialNormalizationConstant

public double getLogPartialNormalizationConstant(int parameterIndex)
                                          throws Exception
Description copied from interface: DifferentiableStatisticalModel
Returns the logarithm of the partial normalization constant for the parameter with index parameterIndex. This is the logarithm of the partial derivation of the normalization constant for the parameter with index parameterIndex,
\[\log \frac{\partial Z(\underline{\lambda})}{\partial \lambda_{parameterindex}}\]
.

Specified by:
getLogPartialNormalizationConstant in interface DifferentiableStatisticalModel
Parameters:
parameterIndex - the index of the parameter
Returns:
the logarithm of the partial normalization constant
Throws:
Exception - if something went wrong with the normalization
See Also:
DifferentiableStatisticalModel.getLogNormalizationConstant()

getHyperparameterForHiddenParameter

public double getHyperparameterForHiddenParameter(int index)
Description copied from class: AbstractMixtureDiffSM
This method returns the hyperparameter for the hidden parameter with index index.

Specified by:
getHyperparameterForHiddenParameter in class AbstractMixtureDiffSM
Parameters:
index - the index of the hidden parameter
Returns:
the hyperparameter for the hidden parameter

getESS

public double getESS()
Description copied from interface: DifferentiableStatisticalModel
Returns the equivalent sample size (ess) of this model, i.e. the equivalent sample size for the class or component that is represented by this model.

Specified by:
getESS in interface DifferentiableStatisticalModel
Returns:
the equivalent sample size.

initializeUsingPlugIn

protected void initializeUsingPlugIn(int index,
                                     boolean freeParams,
                                     DataSet[] data,
                                     double[][] weights)
                              throws Exception
Description copied from class: AbstractMixtureDiffSM
This method initializes the functions using the data in some way.

Specified by:
initializeUsingPlugIn in class AbstractMixtureDiffSM
Parameters:
index - the class index
freeParams - if true, the (reduced) parameterization is used
data - the data
weights - the weights for the data
Throws:
Exception - if the initialization could not be done
See Also:
DifferentiableSequenceScore.initializeFunction(int, boolean, DataSet[], double[][])

adjustHiddenParameters

public void adjustHiddenParameters(int index,
                                   DataSet[] data,
                                   double[][] weights)
                            throws Exception
Adjusts all hidden parameters including duration and mixture parameters according to the current values of the remaining parameters.

Specified by:
adjustHiddenParameters in interface MutableMotifDiscoverer
Parameters:
index - the index of the class of this instance
data - the array of data for all classes
weights - the weights for all sequences in data
Throws:
Exception - thrown if the hidden parameters could not be adjusted

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Specified by:
getInstanceName in interface SequenceScore
Returns:
a short instance name

fillComponentScores

protected void fillComponentScores(Sequence seq,
                                   int start)
Description copied from class: AbstractMixtureDiffSM
Fills the internal array AbstractMixtureDiffSM.componentScore with the logarithmic scores of the components given a Sequence.

Specified by:
fillComponentScores in class AbstractMixtureDiffSM
Parameters:
seq - the sequence
start - the start position in seq

getLogScoreAndPartialDerivation

public double getLogScoreAndPartialDerivation(Sequence seq,
                                              int start,
                                              IntList indices,
                                              DoubleList partialDer)
Description copied from interface: DifferentiableSequenceScore
Returns the logarithmic score for a Sequence beginning at position start in the Sequence and fills lists with the indices and the partial derivations.

Specified by:
getLogScoreAndPartialDerivation in interface DifferentiableSequenceScore
Parameters:
seq - the Sequence
start - the start position in the Sequence
indices - an IntList of indices, after method invocation the list should contain the indices i where $\frac{\partial \log score(seq)}{\partial \lambda_i}$ is not zero
partialDer - a DoubleList of partial derivations, after method invocation the list should contain the corresponding $\frac{\partial \log score(seq)}{\partial \lambda_i}$ that are not zero
Returns:
the logarithmic score for the Sequence

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Specified by:
toString in interface SequenceScore
Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance

initializeMotif

public void initializeMotif(int motifIndex,
                            DataSet data,
                            double[] weights)
                     throws Exception
Description copied from interface: MutableMotifDiscoverer
This method allows to initialize the model of a motif manually using a weighted data set.

Specified by:
initializeMotif in interface MutableMotifDiscoverer
Parameters:
motifIndex - the index of the motif in the motif discoverer
data - the data set of sequences
weights - either null or an array of length data.getNumberofElements() with non-negative weights.
Throws:
Exception - if initialize was not possible

initializeMotifRandomly

public void initializeMotifRandomly(int motif)
                             throws Exception
Description copied from interface: MutableMotifDiscoverer
This method initializes the motif with index motif randomly using for instance DifferentiableSequenceScore.initializeFunctionRandomly(boolean). Furthermore, if available, it also initializes the positional distribution.

Specified by:
initializeMotifRandomly in interface MutableMotifDiscoverer
Parameters:
motif - the index of the motif
Throws:
Exception - either if the index is wrong or if it is thrown by the method DifferentiableSequenceScore.initializeFunctionRandomly(boolean)

modifyMotif

public boolean modifyMotif(int motifIndex,
                           int offsetLeft,
                           int offsetRight)
                    throws Exception
Description copied from interface: MutableMotifDiscoverer
Manually modifies the motif model with index motifIndex. The two offsets offsetLeft and offsetRight define how many positions the left or right border positions shall be moved. Negative numbers indicate moves to the left while positive numbers correspond to moves to the right. The distribution for sequences to the left and right side of the motif shall be computed internally.

Specified by:
modifyMotif in interface MutableMotifDiscoverer
Parameters:
motifIndex - the index of the motif in the motif discoverer
offsetLeft - the offset on the left side
offsetRight - the offset on the right side
Returns:
true if the motif model was modified otherwise false
Throws:
Exception - if some unexpected error occurred during the modification
See Also:
MutableMotifDiscoverer.modifyMotif(int, int, int), Mutable.modify(int, int)

getGlobalIndexOfMotifInComponent

public int getGlobalIndexOfMotifInComponent(int component,
                                            int motif)
Description copied from interface: MotifDiscoverer
Returns the global index of the motif used in component. The index returned must be at least 0 and less than MotifDiscoverer.getNumberOfMotifs().

Specified by:
getGlobalIndexOfMotifInComponent in interface MotifDiscoverer
Parameters:
component - the component index
motif - the motif index in the component
Returns:
the global index of the motif in component

getIndexOfMaximalComponentFor

public int getIndexOfMaximalComponentFor(Sequence sequence)
                                  throws Exception
Description copied from interface: MotifDiscoverer
Returns the index of the component with the maximal score for the sequence sequence.

Specified by:
getIndexOfMaximalComponentFor in interface MotifDiscoverer
Parameters:
sequence - the given sequence
Returns:
the index of the component with the maximal score for the given sequence
Throws:
Exception - if the index could not be computed for any reasons

getMotifLength

public int getMotifLength(int motif)
Description copied from interface: MotifDiscoverer
This method returns the length of the motif with index motif .

Specified by:
getMotifLength in interface MotifDiscoverer
Parameters:
motif - the index of the motif
Returns:
the length of the motif with index motif

getNumberOfMotifs

public int getNumberOfMotifs()
Description copied from interface: MotifDiscoverer
Returns the number of motifs for this MotifDiscoverer.

Specified by:
getNumberOfMotifs in interface MotifDiscoverer
Returns:
the number of motifs

getNumberOfMotifsInComponent

public int getNumberOfMotifsInComponent(int component)
Description copied from interface: MotifDiscoverer
Returns the number of motifs that are used in the component component of this MotifDiscoverer.

Specified by:
getNumberOfMotifsInComponent in interface MotifDiscoverer
Parameters:
component - the component of the MotifDiscoverer
Returns:
the number of motifs

getProfileOfScoresFor

public double[] getProfileOfScoresFor(int component,
                                      int motif,
                                      Sequence sequence,
                                      int startpos,
                                      MotifDiscoverer.KindOfProfile kind)
                               throws Exception
Description copied from interface: MotifDiscoverer
Returns the profile of the scores for component component and motif motif at all possible start positions of the motif in the sequence sequence beginning at startpos. This array should be of length
sequence.length() - startpos - motifs[motif].getLength() + 1.
A high score should encode for a probable start position.

Specified by:
getProfileOfScoresFor in interface MotifDiscoverer
Parameters:
component - the component index
motif - the index of the motif in the component
sequence - the given sequence
startpos - the start position in the sequence
kind - indicates the kind of profile
Returns:
the profile of scores
Throws:
Exception - if the score could not be computed for any reasons

getStrandProbabilitiesFor

public double[] getStrandProbabilitiesFor(int component,
                                          int motif,
                                          Sequence sequence,
                                          int startpos)
                                   throws Exception
Description copied from interface: MotifDiscoverer
This method returns the probabilities of the strand orientations for a given subsequence if it is considered as site of the motif model in a specific component.

Specified by:
getStrandProbabilitiesFor in interface MotifDiscoverer
Parameters:
component - the component index
motif - the index of the motif in the component
sequence - the given sequence
startpos - the start position in the sequence
Returns:
the probabilities of the strand orientations
Throws:
Exception - if the strand could not be computed for any reasons