de.jstacs.scoringFunctions.mix
Class AbstractMixtureScoringFunction

java.lang.Object
  extended by de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
      extended by de.jstacs.scoringFunctions.mix.AbstractMixtureScoringFunction
All Implemented Interfaces:
NormalizableScoringFunction, ScoringFunction, Storable, Cloneable
Direct Known Subclasses:
MixtureScoringFunction

public abstract class AbstractMixtureScoringFunction
extends AbstractNormalizableScoringFunction

This main abstract class for any mixture (e.g. "real" mixture, strand mixture, hidden motif, ...). The potential for the hidden variables is parameterized depending on the parameterization of the given NormalizableScoringFunctions. If these are already normalized (see NormalizableScoringFunction.isNormalized()) the potential is parameterized using the Meila-parameterization, otherwise it is parameterized using the unnormalized MRF-parameterization.

Author:
Jens Keilwagen

Field Summary
protected  double[] componentScore
          This array is used while computing the score.
protected  DoubleList[] dList
          This array contains some DoubleLists that are used while computing the partial derivation
protected  NormalizableScoringFunction[] function
          This array contains the internal functions that are used to determine the score.
protected  double[] hiddenParameter
          This array contains the hidden parameters of the instance.
protected  double[] hiddenPotential
          This array contains the hidden potentials of the instance.
protected  IntList[] iList
          This array contains some IntLists that are used while computing the partial derivation
protected  boolean isNormalized
          This boolean indicates whether this instance is a normalized one or not.
protected  double logGammaSum
          This double contains sum of the logarithm of the gamma functions used in the prior.
protected  double logHiddenNorm
          This double contains logarithm of the normalization constant of hidden parameters of the instance.
protected  double[] logHiddenPotential
          This array contains the logarithm of the hidden potentials of the instance
protected  double norm
          This double contains normalization constant of the instance.
protected  boolean optimizeHidden
          This boolean indicates whether to optimize the hidden variables of this instance.
protected  int[] paramRef
          This array contains the references/indices for the parameters.
protected  double[] partNorm
          This array contains the partial normalization constants, i.e. the normalization constant for each component.
protected  boolean plugIn
          This boolean indicates whether to use a plug-in strategy to initialize the instance.
 
Fields inherited from class de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
alphabets, length, r
 
Fields inherited from interface de.jstacs.scoringFunctions.ScoringFunction
UNKNOWN
 
Constructor Summary
AbstractMixtureScoringFunction(int length, int starts, int dimension, boolean optimizeHidden, boolean plugIn, NormalizableScoringFunction... function)
          This constructor creates an AbstractMixtureScoringFunction.
AbstractMixtureScoringFunction(StringBuffer xml)
          This is the constructor for Storable.
 
Method Summary
 void addGradientOfLogPriorTerm(double[] grad, int start)
          This method computes the gradient of getLogPriorTerm() for each parameter of this model.
 AbstractMixtureScoringFunction clone()
          Creates a clone (deep copy) of the current ScoringFunction instance.
protected  void cloneFunctions(NormalizableScoringFunction[] originalFunctions)
          This method clones the given array of function and enables the user to do some postprocessing.
protected  void computeHiddenParameter(double[] statistic)
          This method has to be invoked during an initialization.
protected  void computeLogGammaSum()
          This method is used to precompute the sum of the logarithm of the gamma functions that is used in the prior.
protected  void extractFurtherInformation(StringBuffer xml)
          This method is the opposite of getFurtherInformation().
protected abstract  void fillComponentScores(Sequence seq, int start)
          Fills the internal array componentScore with the log scores of the components.
protected  void fromXML(StringBuffer b)
          This method is called in the constructor to create a scoring function from a StringBuffer
 double[] getCurrentParameterValues()
          Returns a double array of dimension getNumberOfParameters() containing the current parameter values.
 NormalizableScoringFunction getFunction(int index)
          This method returns a specific internal function
 NormalizableScoringFunction[] getFunctions()
          This method returns an array of clones of the internal used functions.
protected  StringBuffer getFurtherInformation()
          This method is used to append further information of the instance to the xml representation.
abstract  double getHyperparameterForHiddenParameter(int index)
          This method returns the hyperparameter for the hidden parameter with index index.
 int getIndexOfMaximalComponentFor(Sequence seq, int start)
          Returns the index of the component that has the greatest impact on the complete score
protected  int[] getIndices(int index)
          This array is used to compute the relative indices of a parameter index.
 double getLogPriorTerm()
          This method computes a value that is proportional to getESS()*Math.log( getNormalizationConstant() ) + Math.log( prior ).
 double getLogScore(Sequence seq, int start)
          Returns the log score for the sequence
protected static int getMaxIndex(double[] w)
          Returns the index with maximal value in the array.
 double getNormalizationConstant()
          Returns the sum of the scores over all sequences of the event space.
protected abstract  double getNormalizationConstantForComponent(int i)
          Computes the normalization constant for the component i
 int getNumberOfComponents()
          Returns the number of different components.
 int getNumberOfParameters()
          The number of parameters in this scoring function.
 int getNumberOfRecommendedStarts()
          This method return the number of recommended optimization starts.
 double[] getProbsForComponent(Sequence seq)
          Returns the probabilities for each component
 NormalizableScoringFunction[] getScoringFunctions()
          Returns a deep copy of all internal used ScoringFunctions
 int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
          Returns the size of the event space of the random variables that are affected by parameter no.
protected  String getXMLTag()
          This method returns the XML tag of the instance that is used to build and XML representation
protected  void init(boolean freeParams)
          This method creates the underlying structure for the parameters.
 void initializeFunction(int index, boolean freeParams, Sample[] data, double[][] weights)
          This method creates the underlying structure of the scoring function.
 void initializeFunctionRandomly(boolean freeParams)
          This method initializes the scoring function randomly.
protected  void initializeHiddenPotentialRandomly()
          This method initializes the hidden potential (and the corresponding parameters) randomly.
 void initializeHiddenUniformly()
          This method initializes the hidden parameters of the instance uniformly.
protected abstract  void initializeUsingPlugIn(int index, boolean freeParams, Sample[] data, double[][] weights)
          This method initializes the function using the data in some way.
protected  void initWithLength(boolean freeParams, int len)
          This method is used to create the underlying structure, e.g.
 boolean isInitialized()
          This method can be used to determine whether the model is initialized.
 boolean isNormalized()
          This method returns whether the implemented score is already normalized to 1.
protected  void precomputeNorm()
          Precomutes the normalisation constant.
protected  void setHiddenParameters(double[] params, int start)
          This method set the hidden parameters of the model
 void setParameters(double[] params, int start)
          This method sets the internal parameters to the values of params between start and start + this.getNumberOfParameters() - 1
protected  void setParametersForFunction(int index, double[] params, int start)
          This method allows to set the parameters for specific functions.
 StringBuffer toXML()
          This method returns an XML-representation of an instance of the implementing class.
 
Methods inherited from class de.jstacs.scoringFunctions.AbstractNormalizableScoringFunction
getAlphabetContainer, getInitialClassParam, getLength, getLogScore, getLogScoreAndPartialDerivation, isNormalized
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.jstacs.scoringFunctions.NormalizableScoringFunction
getEss, getPartialNormalizationConstant
 
Methods inherited from interface de.jstacs.scoringFunctions.ScoringFunction
getInstanceName, getLogScoreAndPartialDerivation
 

Field Detail

paramRef

protected int[] paramRef
This array contains the references/indices for the parameters. Only the start index for each new function is stored.


optimizeHidden

protected boolean optimizeHidden
This boolean indicates whether to optimize the hidden variables of this instance. (It is not used recursive.)


plugIn

protected boolean plugIn
This boolean indicates whether to use a plug-in strategy to initialize the instance.


function

protected NormalizableScoringFunction[] function
This array contains the internal functions that are used to determine the score.


hiddenParameter

protected double[] hiddenParameter
This array contains the hidden parameters of the instance.


logHiddenPotential

protected double[] logHiddenPotential
This array contains the logarithm of the hidden potentials of the instance


hiddenPotential

protected double[] hiddenPotential
This array contains the hidden potentials of the instance.


componentScore

protected double[] componentScore
This array is used while computing the score. It stores the scores of the components and is used to avoid creating a new array every time.


partNorm

protected double[] partNorm
This array contains the partial normalization constants, i.e. the normalization constant for each component.


norm

protected double norm
This double contains normalization constant of the instance.


logHiddenNorm

protected double logHiddenNorm
This double contains logarithm of the normalization constant of hidden parameters of the instance.


logGammaSum

protected double logGammaSum
This double contains sum of the logarithm of the gamma functions used in the prior.

See Also:
computeLogGammaSum()

dList

protected DoubleList[] dList
This array contains some DoubleLists that are used while computing the partial derivation


iList

protected IntList[] iList
This array contains some IntLists that are used while computing the partial derivation


isNormalized

protected boolean isNormalized
This boolean indicates whether this instance is a normalized one or not.

Constructor Detail

AbstractMixtureScoringFunction

public AbstractMixtureScoringFunction(int length,
                                      int starts,
                                      int dimension,
                                      boolean optimizeHidden,
                                      boolean plugIn,
                                      NormalizableScoringFunction... function)
                               throws CloneNotSupportedException
This constructor creates an AbstractMixtureScoringFunction.

Parameters:
length - the sequence length that should be modeled
starts - the number of starts the should be done in an optimization
dimension - the number of different mixture components
optimizeHidden - whether the parameters for the hidden variables should be optimized
plugIn - whether the initial parameters for an optimization should be related to the data or randomly drawn
function - the ScoringFunctions
Throws:
CloneNotSupportedException

AbstractMixtureScoringFunction

public AbstractMixtureScoringFunction(StringBuffer xml)
                               throws NonParsableException
This is the constructor for Storable.

Parameters:
xml - the xml representation
Throws:
NonParsableException - if the representation could not be parsed.
Method Detail

getMaxIndex

protected static final int getMaxIndex(double[] w)
Returns the index with maximal value in the array.

Parameters:
w - the array
Returns:
the index

computeLogGammaSum

protected void computeLogGammaSum()
This method is used to precompute the sum of the logarithm of the gamma functions that is used in the prior.


clone

public AbstractMixtureScoringFunction clone()
                                     throws CloneNotSupportedException
Description copied from interface: ScoringFunction
Creates a clone (deep copy) of the current ScoringFunction instance.

Specified by:
clone in interface ScoringFunction
Overrides:
clone in class AbstractNormalizableScoringFunction
Returns:
the cloned instance
Throws:
CloneNotSupportedException

cloneFunctions

protected void cloneFunctions(NormalizableScoringFunction[] originalFunctions)
                       throws CloneNotSupportedException
This method clones the given array of function and enables the user to do some postprocessing. This method is only used in clone().

Parameters:
originalFunctions - the array of functions to be cloned
Throws:
CloneNotSupportedException

getHyperparameterForHiddenParameter

public abstract double getHyperparameterForHiddenParameter(int index)
This method returns the hyperparameter for the hidden parameter with index index.

Parameters:
index - the index
Returns:
the hyperparameter

getLogPriorTerm

public double getLogPriorTerm()
Description copied from interface: NormalizableScoringFunction
This method computes a value that is proportional to

getESS()*Math.log( getNormalizationConstant() ) + Math.log( prior ).

where prior is the prior for the parameters of this model.

Returns:
getESS()*Math.log( getNormalizationConstant() ) + Math.log( prior )
See Also:
NormalizableScoringFunction.getEss(), NormalizableScoringFunction.getNormalizationConstant()

addGradientOfLogPriorTerm

public void addGradientOfLogPriorTerm(double[] grad,
                                      int start)
                               throws Exception
Description copied from interface: NormalizableScoringFunction
This method computes the gradient of getLogPriorTerm() for each parameter of this model. The results are added to the array grad beginning at index start.

Parameters:
grad - the gradient
start - the start index in the grad array, where the partial derivations for the parameters of this models shall be enter
Throws:
Exception
See Also:
NormalizableScoringFunction.getLogPriorTerm()

getIndexOfMaximalComponentFor

public int getIndexOfMaximalComponentFor(Sequence seq,
                                         int start)
Returns the index of the component that has the greatest impact on the complete score

Parameters:
seq - the sequence
start - the start position
Returns:
the index of the component

getCurrentParameterValues

public double[] getCurrentParameterValues()
                                   throws Exception
Description copied from interface: ScoringFunction
Returns a double array of dimension getNumberOfParameters() containing the current parameter values. If on e likes to use these parameters to start an optimization it is highly recommended to invoke ScoringFunction.initializeFunction(int, boolean, Sample[], double[][]) before. After an optimization this method can be used to get the current parameter values.

Returns:
the current parameter values
Throws:
Exception - is thrown if no parameters exist, yet

getLogScore

public double getLogScore(Sequence seq,
                          int start)
Description copied from interface: ScoringFunction
Returns the log score for the sequence

Parameters:
seq - the sequence
start - the startposition in the sequence
Returns:
the log score for the sequence

getNormalizationConstant

public final double getNormalizationConstant()
Description copied from interface: NormalizableScoringFunction
Returns the sum of the scores over all sequences of the event space.

Returns:
the normalization constant Z

getNumberOfComponents

public final int getNumberOfComponents()
Returns the number of different components.

Returns:
the number of different components.

getNumberOfParameters

public final int getNumberOfParameters()
Description copied from interface: ScoringFunction
The number of parameters in this scoring function. If the number of parameters is not known yet, the method returns UNKNOWN.

Returns:
the number of parameters in this scoring function
See Also:
ScoringFunction.UNKNOWN

getNumberOfRecommendedStarts

public final int getNumberOfRecommendedStarts()
Description copied from interface: ScoringFunction
This method return the number of recommended optimization starts. The standard implementation returns 1.

Specified by:
getNumberOfRecommendedStarts in interface ScoringFunction
Overrides:
getNumberOfRecommendedStarts in class AbstractNormalizableScoringFunction
Returns:
the number of recommended optimization starts

getProbsForComponent

public double[] getProbsForComponent(Sequence seq)
Returns the probabilities for each component

Parameters:
seq - the sequence
Returns:
an array containing the probability of component i (=p(i|class,seq)) in entry i

getScoringFunctions

public NormalizableScoringFunction[] getScoringFunctions()
                                                  throws CloneNotSupportedException
Returns a deep copy of all internal used ScoringFunctions

Returns:
a deep copy of all internal used ScoringFunctions
Throws:
CloneNotSupportedException

getSizeOfEventSpaceForRandomVariablesOfParameter

public int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Description copied from interface: NormalizableScoringFunction
Returns the size of the event space of the random variables that are affected by parameter no. index, i.e. the product of the sizes of the alphabets at the position of each random variable affected by parameter index. For DNA-alphabets this corresponds to 4 for a PWM, 16 for a WAM except position 0, ...

Parameters:
index - the index of the parameter
Returns:
the size of the event space

initializeFunction

public void initializeFunction(int index,
                               boolean freeParams,
                               Sample[] data,
                               double[][] weights)
                        throws Exception
Description copied from interface: ScoringFunction
This method creates the underlying structure of the scoring function.

Parameters:
index - the index of the class the scoring function models
freeParams - if true, the (reduced) parameterization is used
data - the samples
weights - the weights of the sequences in the samples
Throws:
Exception

initializeUsingPlugIn

protected abstract void initializeUsingPlugIn(int index,
                                              boolean freeParams,
                                              Sample[] data,
                                              double[][] weights)
                                       throws Exception
This method initializes the function using the data in some way.

Parameters:
index - the class index
freeParams - if true, the (reduced) parameterization is used
data - the data
weights - the weights
Throws:
Exception - if the initialization could not be done
See Also:
ScoringFunction.initializeFunction(int, boolean, Sample[], double[][])

initializeFunctionRandomly

public void initializeFunctionRandomly(boolean freeParams)
                                throws Exception
Description copied from interface: ScoringFunction
This method initializes the scoring function randomly. It has to create the underlying structure of the scoring function.

Parameters:
freeParams - if true, the (reduced) parameterization is used
Throws:
Exception

initializeHiddenPotentialRandomly

protected void initializeHiddenPotentialRandomly()
This method initializes the hidden potential (and the corresponding parameters) randomly.


isInitialized

public boolean isInitialized()
Description copied from interface: ScoringFunction
This method can be used to determine whether the model is initialized. If the model is not initialize you should invoke the method ScoringFunction.initializeFunction(int, boolean, Sample[], double[][]).

Returns:
true if the model is initialized

setParameters

public void setParameters(double[] params,
                          int start)
Description copied from interface: ScoringFunction
This method sets the internal parameters to the values of params between start and start + this.getNumberOfParameters() - 1

Parameters:
params - the parameters
start - the start index

initializeHiddenUniformly

public void initializeHiddenUniformly()
This method initializes the hidden parameters of the instance uniformly.


setHiddenParameters

protected void setHiddenParameters(double[] params,
                                   int start)
This method set the hidden parameters of the model

Parameters:
params - the parameter vector
start - the start index

setParametersForFunction

protected void setParametersForFunction(int index,
                                        double[] params,
                                        int start)
This method allows to set the parameters for specific functions.

Parameters:
index - the function index
params - the parameter vector
start - the start index

toXML

public final StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML-representation of an instance of the implementing class.

Returns:
the XML-representation

fromXML

protected final void fromXML(StringBuffer b)
                      throws NonParsableException
Description copied from class: AbstractNormalizableScoringFunction
This method is called in the constructor to create a scoring function from a StringBuffer

Specified by:
fromXML in class AbstractNormalizableScoringFunction
Parameters:
b - the XML representation
Throws:
NonParsableException - if the StringBuffer could not be parsed.

getFurtherInformation

protected StringBuffer getFurtherInformation()
This method is used to append further information of the instance to the xml representation. This method is designed to allow subclass to add information.

Returns:
the further information as XML in a StringBuffer

extractFurtherInformation

protected void extractFurtherInformation(StringBuffer xml)
                                  throws NonParsableException
This method is the opposite of getFurtherInformation().

Parameters:
xml - the StringBuffer containing the information
Throws:
NonParsableException - if the StringBuffer could not be parsed

getIndices

protected int[] getIndices(int index)
This array is used to compute the relative indices of a parameter index.

Parameters:
index - the parameter index
Returns:
the indices
See Also:
paramRef

getXMLTag

protected String getXMLTag()
This method returns the XML tag of the instance that is used to build and XML representation

Returns:
the XML tag of the instance

init

protected void init(boolean freeParams)
This method creates the underlying structure for the parameters.

Parameters:
freeParams - whether to use free parameters or all

initWithLength

protected final void initWithLength(boolean freeParams,
                                    int len)
This method is used to create the underlying structure, e.g. paramRef

Parameters:
freeParams - whether to use free parameters or all
len - the length of the paramRef array

computeHiddenParameter

protected void computeHiddenParameter(double[] statistic)
This method has to be invoked during an initialization.

Parameters:
statistic - a statistic for the initialization of the hidden parameters
See Also:
ScoringFunction.initializeFunction(int, boolean, Sample[], double[][])

precomputeNorm

protected void precomputeNorm()
Precomutes the normalisation constant.


getNormalizationConstantForComponent

protected abstract double getNormalizationConstantForComponent(int i)
Computes the normalization constant for the component i

Parameters:
i - the index of the component
Returns:
the normalization constant

fillComponentScores

protected abstract void fillComponentScores(Sequence seq,
                                            int start)
Fills the internal array componentScore with the log scores of the components.

Parameters:
seq - the sequence
start - the start position

isNormalized

public boolean isNormalized()
Description copied from interface: NormalizableScoringFunction
This method returns whether the implemented score is already normalized to 1. The standard implementation returns false.

Specified by:
isNormalized in interface NormalizableScoringFunction
Overrides:
isNormalized in class AbstractNormalizableScoringFunction
Returns:
whether the implemented score is already normalized to 1

getFunction

public NormalizableScoringFunction getFunction(int index)
                                        throws CloneNotSupportedException
This method returns a specific internal function

Parameters:
index - the index of the function
Returns:
a clone of the function
Throws:
CloneNotSupportedException - if the function could not be cloned

getFunctions

public NormalizableScoringFunction[] getFunctions()
                                           throws CloneNotSupportedException
This method returns an array of clones of the internal used functions.

Returns:
an array of clones of the internal used functions
Throws:
CloneNotSupportedException - if at least one function could not be cloned