de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.shared
Class SharedStructureMixture

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
          extended by de.jstacs.sequenceScores.statisticalModels.trainable.mixture.MixtureTrainSM
              extended by de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.shared.SharedStructureMixture
All Implemented Interfaces:
SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable

public class SharedStructureMixture
extends MixtureTrainSM

This class handles a mixture of models with the same structure that is learned via EM. One well known example is a mixture of trees.

Author:
Jens Keilwagen

Nested Class Summary
 
Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization
 
Field Summary
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
protected SharedStructureMixture(FSDAGTrainSM[] m, StructureLearner.ModelType model, byte order, int starts, boolean estimateComponentProbs, double[] weights, double alpha, TerminationCondition tc)
          Creates a new SharedStructureMixture instance with all relevant values.
  SharedStructureMixture(FSDAGTrainSM[] m, StructureLearner.ModelType model, byte order, int starts, double[] weights, double alpha, TerminationCondition tc)
          Creates a new SharedStructureMixture instance with fixed component weights.
  SharedStructureMixture(FSDAGTrainSM[] m, StructureLearner.ModelType model, byte order, int starts, double alpha, TerminationCondition tc)
          Creates a new SharedStructureMixture instance which estimates the component probabilities/weights.
  SharedStructureMixture(StringBuffer xml)
          The standard constructor for the interface Storable.
 
Method Summary
 SharedStructureMixture clone()
          Follows the conventions of Object's clone()-method.
protected  void fromXML(StringBuffer representation)
          This method should only be used by the constructor that works on a StringBuffer.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
protected  void getNewParameters(int iteration, double[][] seqWeights, double[] w)
          This method trains the internal models on the internal data set and the given weights.
 String getStructure()
          Returns a String representation of the structure of the used models.
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.MixtureTrainSM
doFirstIteration, doFirstIteration, emitDataSetUsingCurrentParameterSet, getLogProbUsingCurrentParameterSetFor, getNewWeights, setTrainData, toString
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
algorithmHasBeenRun, checkLength, checkModelsForGibbsSampling, continueIterations, continueIterations, createSeqWeightsArray, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, extractFurtherInformation, finalize, getCharacteristics, getFurtherInformation, getIndexOfMaximalComponentFor, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParametersForModel, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setWeights, swap, train
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, train
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SharedStructureMixture

public SharedStructureMixture(FSDAGTrainSM[] m,
                              StructureLearner.ModelType model,
                              byte order,
                              int starts,
                              double alpha,
                              TerminationCondition tc)
                       throws IllegalArgumentException,
                              WrongAlphabetException,
                              CloneNotSupportedException
Creates a new SharedStructureMixture instance which estimates the component probabilities/weights.

Parameters:
m - the single models building the mixture model
model - the type of the model
order - the order of the model
starts - the number of times the algorithm will be started in the train-method, at least 1
alpha - the positive parameter for the Dirichlet distribution which is used when you invoke train to initialize the gammas, it is recommended to use alpha = 1 (uniform distribution on a simplex)
tc - the TerminationCondition for stopping the EM-algorithm, tc has to return true from TerminationCondition.isSimple()
Throws:
IllegalArgumentException - if
  • the models are not able to score the sequences of same length
  • dimension < 1
  • weights != null && weights.length != dimension
  • weights != null and it exists an i where weights[i] < 0
  • starts < 1
  • componentHyperParams (hyperparameters for the component assignment prior) are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be cloned
See Also:
StructureLearner.ModelType, SharedStructureMixture(FSDAGTrainSM[], de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.StructureLearner.ModelType, byte, int, boolean, double[], double, TerminationCondition)

SharedStructureMixture

public SharedStructureMixture(FSDAGTrainSM[] m,
                              StructureLearner.ModelType model,
                              byte order,
                              int starts,
                              double[] weights,
                              double alpha,
                              TerminationCondition tc)
                       throws IllegalArgumentException,
                              WrongAlphabetException,
                              CloneNotSupportedException
Creates a new SharedStructureMixture instance with fixed component weights.

Parameters:
m - the single models building the mixture model
model - the type of the model
order - the order of the model
starts - the number of times the algorithm will be started in the train-method, at least 1
weights - null or the weights for the components (then weights.length == models.length)
alpha - the positive parameter for the Dirichlet distribution which is used when you invoke train to initialize the gammas, it is recommended to use alpha = 1 (uniform distribution on a simplex)
tc - the TerminationCondition for stopping the EM-algorithm, tc has to return true from TerminationCondition.isSimple()
Throws:
IllegalArgumentException - if
  • the models are not able to score the sequences of same length
  • dimension < 1
  • weights != null && weights.length != dimension
  • weights != null and it exists an i where weights[i] < 0
  • starts < 1
  • componentHyperParams (hyperparameters for the component assignment prior) are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be cloned
See Also:
StructureLearner.ModelType, SharedStructureMixture(FSDAGTrainSM[], de.jstacs.sequenceScores.statisticalModels.trainable.discrete.inhomogeneous.StructureLearner.ModelType, byte, int, boolean, double[], double, TerminationCondition)

SharedStructureMixture

protected SharedStructureMixture(FSDAGTrainSM[] m,
                                 StructureLearner.ModelType model,
                                 byte order,
                                 int starts,
                                 boolean estimateComponentProbs,
                                 double[] weights,
                                 double alpha,
                                 TerminationCondition tc)
                          throws IllegalArgumentException,
                                 WrongAlphabetException,
                                 CloneNotSupportedException
Creates a new SharedStructureMixture instance with all relevant values. This constructor is used by the other main constructors.

Parameters:
m - the single models building the mixture model
model - the type of the model
order - the order of the model
starts - the number of times the algorithm will be started in the train-method, at least 1
estimateComponentProbs - the switch for estimating the component probabilities in the algorithm or to hold them fixed; if the component parameters are fixed, the values of weights will be used, otherwise the componentHyperParams (hyperparameters for the component assignment prior) will be incorporated in the adjustment
weights - null or the weights for the components (then weights.length == models.length)
alpha - the positive parameter for the Dirichlet distribution which is used when you invoke train to initialize the gammas, it is recommended to use alpha = 1 (uniform distribution on a simplex)
tc - the TerminationCondition for stopping the EM-algorithm, tc has to return true from TerminationCondition.isSimple()
Throws:
IllegalArgumentException - if
  • the models are not able to score the sequences of same length
  • dimension < 1
  • weights != null && weights.length != dimension
  • weights != null and it exists an i where weights[i] < 0
  • starts < 1
  • componentHyperParams (hyperparameters for the component assignment prior) are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be cloned
See Also:
StructureLearner.ModelType, MixtureTrainSM.MixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int, boolean, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest)

SharedStructureMixture

public SharedStructureMixture(StringBuffer xml)
                       throws NonParsableException
The standard constructor for the interface Storable. Creates a new SharedStructureMixture out of its XML representation.

Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the SharedStructureMixture could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable, MixtureTrainSM.MixtureTrainSM(StringBuffer)
Method Detail

clone

public SharedStructureMixture clone()
                             throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class AbstractMixtureTrainSM
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning

getStructure

public String getStructure()
                    throws NotTrainedException
Returns a String representation of the structure of the used models.

Returns:
a String representation of the structure of the used models
Throws:
NotTrainedException - if the classifier is not trained yet
See Also:
FSDAGTrainSM.getStructure()

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Specified by:
getInstanceName in interface SequenceScore
Overrides:
getInstanceName in class AbstractMixtureTrainSM
Returns:
a short instance name

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Specified by:
toXML in interface Storable
Overrides:
toXML in class AbstractMixtureTrainSM
Returns:
the XML representation

fromXML

protected void fromXML(StringBuffer representation)
                throws NonParsableException
Description copied from class: AbstractTrainableStatisticalModel
This method should only be used by the constructor that works on a StringBuffer. It is the counter part of Storable.toXML().

Overrides:
fromXML in class AbstractMixtureTrainSM
Parameters:
representation - the XML representation of the model
Throws:
NonParsableException - if the StringBuffer is not parsable or the representation is conflicting
See Also:
AbstractTrainableStatisticalModel.AbstractTrainableStatisticalModel(StringBuffer)

getNewParameters

protected void getNewParameters(int iteration,
                                double[][] seqWeights,
                                double[] w)
                         throws Exception
Description copied from class: AbstractMixtureTrainSM
This method trains the internal models on the internal data set and the given weights.

Overrides:
getNewParameters in class AbstractMixtureTrainSM
Parameters:
iteration - the number of times this method has been invoked
seqWeights - the weights for each model and sequence
w - the weights for the components
Throws:
Exception - if the training of the internal models went wrong