de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif
Class HiddenMotifMixture

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
          extended by de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.HiddenMotifMixture
All Implemented Interfaces:
MotifDiscoverer, SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable
Direct Known Subclasses:
ZOOPSTrainSM

public abstract class HiddenMotifMixture
extends AbstractMixtureTrainSM
implements MotifDiscoverer

This is the main class that every generative motif discoverer should implement.

Author:
Jens Keilwagen

Nested Class Summary
 
Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization
 
Nested classes/interfaces inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer
MotifDiscoverer.KindOfProfile
 
Field Summary
protected  PositionPrior posPrior
          The prior for the positions.
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
protected HiddenMotifMixture(StringBuffer xml)
          The standard constructor for the interface Storable.
protected HiddenMotifMixture(TrainableStatisticalModel[] models, boolean[] optimzeArray, int components, int starts, boolean estimateComponentProbs, double[] componentHyperParams, double[] weights, PositionPrior posPrior, AbstractMixtureTrainSM.Algorithm algorithm, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization, int initialIteration, int stationaryIteration, BurnInTest burnInTest)
          Creates a new HiddenMotifMixture.
 
Method Summary
protected  void checkLength(int index, int l)
          This method checks if the length l of the model with index index is capable for the current instance.
 HiddenMotifMixture clone()
          Follows the conventions of Object's clone()-method.
protected  Sequence[] emitDataSetUsingCurrentParameterSet(int n, int... lengths)
          Standard implementation throwing an OperationNotSupportedException.
protected  void extractFurtherInformation(StringBuffer xml)
          This method is used in the subclasses to extract further information from the XML representation and to set these as values of the instance.
protected  StringBuffer getFurtherInformation()
          This method is used in the subclasses to append further information to the XML representation.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
abstract  int getMinimalSequenceLength()
          Returns the minimal length a sequence respectively a data set has to have.
protected  void getNewParameters(int iteration, double[][] seqWeights, double[] w)
          This method trains the internal models on the internal data set and the given weights.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
 void train(DataSet data, double[] weights)
          Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights.
abstract  void trainBgModel(DataSet data, double[] weights)
          This method trains the background model.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
algorithmHasBeenRun, checkModelsForGibbsSampling, continueIterations, continueIterations, createSeqWeightsArray, doFirstIteration, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, finalize, fromXML, getCharacteristics, getIndexOfMaximalComponentFor, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogProbUsingCurrentParameterSetFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParametersForModel, getNewWeights, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setTrainData, setWeights, swap, toXML
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, train
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer
getGlobalIndexOfMotifInComponent, getIndexOfMaximalComponentFor, getMotifLength, getNumberOfComponents, getNumberOfMotifs, getNumberOfMotifsInComponent, getProfileOfScoresFor, getStrandProbabilitiesFor
 
Methods inherited from interface de.jstacs.Storable
toXML
 

Field Detail

posPrior

protected PositionPrior posPrior
The prior for the positions.

Constructor Detail

HiddenMotifMixture

protected HiddenMotifMixture(TrainableStatisticalModel[] models,
                             boolean[] optimzeArray,
                             int components,
                             int starts,
                             boolean estimateComponentProbs,
                             double[] componentHyperParams,
                             double[] weights,
                             PositionPrior posPrior,
                             AbstractMixtureTrainSM.Algorithm algorithm,
                             double alpha,
                             TerminationCondition tc,
                             AbstractMixtureTrainSM.Parameterization parametrization,
                             int initialIteration,
                             int stationaryIteration,
                             BurnInTest burnInTest)
                      throws CloneNotSupportedException,
                             IllegalArgumentException,
                             WrongAlphabetException
Creates a new HiddenMotifMixture. This constructor can be used for any algorithm since it takes all necessary values as parameters.

Parameters:
models - the single models building the HiddenMotifMixture, if the model is trained using AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the models that will be adjusted have to implement SamplingComponent. The models that are used for the flanking sequences have to be able to score sequences of arbitrary length.
optimzeArray - a array of switches whether to train or not the corresponding model
components - the number of components (e.g. for ZOOPS this is 2)
starts - the number of times the algorithm will be started in the train-method, at least 1
estimateComponentProbs - the switch for estimating the component probabilities in the algorithm or to hold them fixed; if the component parameters are fixed, the values of weights will be used, otherwise the componentHyperParams will be incorporated in the adjustment
componentHyperParams - the hyperparameters for the component assignment prior
  • will only be used if estimateComponentProbs == true
  • the array has to be null or has to have length dimension
  • null or an array with all values zero (0) then ML
  • otherwise (all values positive) a prior is used (MAP, MP, ...)
  • depends on the parameterization
weights - null or the weights for the components (then weights.length == dimension)
posPrior - this object determines the positional distribution that shall be used
algorithm - either AbstractMixtureTrainSM.Algorithm.EM or AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
alpha - only for AbstractMixtureTrainSM.Algorithm.EM
the positive parameter for the Dirichlet distribution which is used when you invoke train to initialize the gammas. It is recommended to use alpha = 1 (uniform distribution on a simplex).
tc - only for AbstractMixtureTrainSM.Algorithm.EM
the TerminationCondition for stopping the EM-algorithm, tc has to return true from TerminationCondition.isSimple()
parametrization - only for AbstractMixtureTrainSM.Algorithm.EM
the type of the component probability parameterization;
initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the positive length of the initial sampling phase (at least 1, at most stationaryIteration/starts)
stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the positive length of the stationary phase (at least 1) (summed over all starts), i.e. the number of parameter sets that is used in approximation
burnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the test that will be used to determine the length of the burn-in phase
Throws:
IllegalArgumentException - if
  • the models are not able to score the sequence of the corresponding length
  • weights != null && weights.length != 2
  • weights != null and it exists an i where weights[i] < 0
  • starts < 1
  • componentHyperParams are not correct
  • the algorithm specific parameters are not correct
WrongAlphabetException - if not all models work on the same simple alphabet
CloneNotSupportedException - if the models can not be cloned

HiddenMotifMixture

protected HiddenMotifMixture(StringBuffer xml)
                      throws NonParsableException
The standard constructor for the interface Storable. Creates a new HiddenMotifMixture out of its XML representation.

Parameters:
xml - the XML representation of the model as StringBuffer
Throws:
NonParsableException - if the StringBuffer can not be parsed
Method Detail

clone

public HiddenMotifMixture clone()
                         throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface MotifDiscoverer
Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class AbstractMixtureTrainSM
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning
See Also:
Cloneable

getFurtherInformation

protected StringBuffer getFurtherInformation()
Description copied from class: AbstractMixtureTrainSM
This method is used in the subclasses to append further information to the XML representation.

Overrides:
getFurtherInformation in class AbstractMixtureTrainSM
Returns:
a part of the XML representation
See Also:
AbstractMixtureTrainSM.extractFurtherInformation(StringBuffer)

extractFurtherInformation

protected void extractFurtherInformation(StringBuffer xml)
                                  throws NonParsableException
Description copied from class: AbstractMixtureTrainSM
This method is used in the subclasses to extract further information from the XML representation and to set these as values of the instance.

Overrides:
extractFurtherInformation in class AbstractMixtureTrainSM
Parameters:
xml - the XML representation
Throws:
NonParsableException - if the XML representation is not parsable
See Also:
AbstractMixtureTrainSM.getFurtherInformation()

train

public void train(DataSet data,
                  double[] weights)
           throws Exception
Description copied from interface: TrainableStatisticalModel
Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the data set as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Specified by:
train in interface TrainableStatisticalModel
Overrides:
train in class AbstractMixtureTrainSM
Parameters:
data - the given sequences as DataSet
weights - the weights of the elements, each weight should be non-negative
Throws:
Exception - if the training did not succeed (e.g. the dimension of weights and the number of sequences in the data set do not match)
See Also:
DataSet.getElementAt(int), DataSet.ElementEnumerator

getNewParameters

protected void getNewParameters(int iteration,
                                double[][] seqWeights,
                                double[] w)
                         throws Exception
Description copied from class: AbstractMixtureTrainSM
This method trains the internal models on the internal data set and the given weights.

Overrides:
getNewParameters in class AbstractMixtureTrainSM
Parameters:
iteration - the number of times this method has been invoked
seqWeights - the weights for each model and sequence
w - the weights for the components
Throws:
Exception - if the training of the internal models went wrong

trainBgModel

public abstract void trainBgModel(DataSet data,
                                  double[] weights)
                           throws Exception
This method trains the background model. This can be useful if the background model is not trained during the EM-algorithm.

Parameters:
data - the data set
weights - the weights
Throws:
Exception - if something went wrong

checkLength

protected void checkLength(int index,
                           int l)
Description copied from class: AbstractMixtureTrainSM
This method checks if the length l of the model with index index is capable for the current instance. Otherwise an IllegalArgumentException is thrown.

Overrides:
checkLength in class AbstractMixtureTrainSM
Parameters:
index - the index of the model
l - the length of the model

getMinimalSequenceLength

public abstract int getMinimalSequenceLength()
Returns the minimal length a sequence respectively a data set has to have.

Returns:
the minimal length a sequence respectively a data set has to have

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Specified by:
getInstanceName in interface SequenceScore
Overrides:
getInstanceName in class AbstractMixtureTrainSM
Returns:
a short instance name

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Specified by:
toString in interface SequenceScore
Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance

emitDataSetUsingCurrentParameterSet

protected Sequence[] emitDataSetUsingCurrentParameterSet(int n,
                                                         int... lengths)
                                                  throws Exception
Standard implementation throwing an OperationNotSupportedException.

Specified by:
emitDataSetUsingCurrentParameterSet in class AbstractMixtureTrainSM
Parameters:
n - the number of sequences to be sampled
lengths - the corresponding lengths
Returns:
an array of sequences
Throws:
Exception - if it was impossible to sample the sequences
See Also:
StatisticalModel.emitDataSet(int, int...)