public class MixtureTrainSM extends AbstractMixtureTrainSM
TrainableStatisticalModels.
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterizationalgorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weightsalphabets, length| Modifier | Constructor and Description |
|---|---|
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
double[] weights,
int starts,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and fixed component probabilities.
|
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
double[] weights,
int starts,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and fixed component
probabilities.
|
protected |
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
boolean estimateComponentProbs,
double[] componentHyperParams,
double[] weights,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new
MixtureTrainSM. |
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
double[] componentHyperParams,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and estimating the component probabilities.
|
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
double[] componentHyperParams,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and sampling the component
probabilities.
|
|
MixtureTrainSM(StringBuffer xml)
The constructor for the interface
Storable. |
| Modifier and Type | Method and Description |
|---|---|
double[][] |
doFirstIteration(DataSet data,
double[] dataWeights,
double[][] partitioning)
This method enables you to train a mixture model with a fixed start
partitioning.
|
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current
model on the internal data set.
|
protected Sequence[] |
emitDataSetUsingCurrentParameterSet(int n,
int... lengths)
The method returns an array of sequences using the current parameter set.
|
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence s,
int start,
int end)
Returns the logarithmic probability for the sequence and the given
component using the current parameter set.
|
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score.
|
protected void |
setTrainData(DataSet data)
This method is invoked by the
train-method and sets for a
given data set the data set that should be used for train. |
String |
toString(NumberFormat nf)
This method returns a
String representation of the instance. |
algorithmHasBeenRun, checkLength, checkModelsForGibbsSampling, clone, continueIterations, continueIterations, createSeqWeightsArray, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, extractFurtherInformation, finalize, fromXML, getCharacteristics, getFurtherInformation, getIndexOfMaximalComponentFor, getInstanceName, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParameters, getNewParametersForModel, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setWeights, swap, toXML, traincheck, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, trainprotected MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
boolean estimateComponentProbs,
double[] componentHyperParams,
double[] weights,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws IllegalArgumentException,
WrongAlphabetException,
CloneNotSupportedException
MixtureTrainSM. This constructor can be used for any
algorithm since it takes all necessary values as parameters.length - the length used in this modelmodels - the single models building the MixtureTrainSM, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
models that will be adjusted have to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1estimateComponentProbs - the switch for estimating the component probabilities in the
algorithm or to hold them fixed; if the component parameters
are fixed, the values of weights will be used,
otherwise the componentHyperParams will be
incorporated in the adjustmentcomponentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length models.length
null or an array with all values zero (0)
then ML
parameterization
weights - null or the weights for the components (then
weights.length == models.length)algorithm - either AbstractMixtureTrainSM.Algorithm.EM or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGalpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGIllegalArgumentException - if
length dimension < 1 weights != null && weights.length != dimension
weights != null and it exists an i
where weights[i] < 0 starts
< 1 componentHyperParams are not
correct WrongAlphabetException - if not all models work on the same alphabetCloneNotSupportedException - if the models can not be clonedpublic MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
double[] componentHyperParams,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
throws IllegalArgumentException,
WrongAlphabetException,
CloneNotSupportedException
length - the length used in this modelmodels - the single models building the MixtureTrainSM, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
models that will be adjusted have to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length models.length
null or an array with all values zero (0)
then ML
parameterization
alpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabetCloneNotSupportedException - if the models can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.EMpublic MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
double[] weights,
int starts,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
throws IllegalArgumentException,
WrongAlphabetException,
CloneNotSupportedException
length - the length used in this modelmodels - the single models building the MixtureTrainSM, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
models that will be adjusted have to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1weights - null or the weights for the components (then
weights.length == models.length)alpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabetCloneNotSupportedException - if the models can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.EMpublic MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
double[] componentHyperParams,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws IllegalArgumentException,
WrongAlphabetException,
CloneNotSupportedException
length - the length used in this modelmodels - the single models building the MixtureTrainSM, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
models that will be adjusted have to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length models.length
null or an array with all values zero (0)
then ML
parameterization
initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGIllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabetCloneNotSupportedException - if the models can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGpublic MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
double[] weights,
int starts,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws IllegalArgumentException,
WrongAlphabetException,
CloneNotSupportedException
length - the length used in this modelmodels - the single models building the MixtureTrainSM, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
models that will be adjusted have to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1weights - null or the weights for the components (than
weights.length == models.length)initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGIllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabetCloneNotSupportedException - if the models can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGpublic MixtureTrainSM(StringBuffer xml) throws NonParsableException
Storable. Creates a
new MixtureTrainSM out of its XML representation.xml - the XML representation of the model as StringBufferNonParsableException - if the StringBuffer is not parsableprotected Sequence[] emitDataSetUsingCurrentParameterSet(int n, int... lengths) throws Exception
AbstractMixtureTrainSMemitDataSetUsingCurrentParameterSet in class AbstractMixtureTrainSMn - the number of sequences to be sampledlengths - the corresponding lengthsException - if it was impossible to sample the sequencesStatisticalModel.emitDataSet(int, int...)protected double[][] doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
AbstractMixtureTrainSMdoFirstIteration in class AbstractMixtureTrainSMdataWeights - null or the weights of each element of the data setm - the multivariate random generatorparams - the parameters for the multivariate random generatorException - if something went wrongpublic double[][] doFirstIteration(DataSet data, double[] dataWeights, double[][] partitioning) throws Exception
data - the data set of sequencesdataWeights - null or the weights of each element of the data setpartitioning - a kind of partitioning
partitioning.length has to be
data.getNumberofElements()
partitioning[i].length has to be
getNumberOfModels()
Exception - if something went wrong or if the number of components is 1protected double getLogProbUsingCurrentParameterSetFor(int component,
Sequence s,
int start,
int end)
throws Exception
AbstractMixtureTrainSMgetLogProbUsingCurrentParameterSetFor in class AbstractMixtureTrainSMcomponent - the index of the components - the sequencestart - the start position in the sequenceend - the end position in the sequencelog P(s,component) = log P(s|component) + log P(component)Exception - if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()public String toString(NumberFormat nf)
SequenceScoreString representation of the instance.nf - the NumberFormat for the String representation of parameters or probabilitiesString representation of the instanceprotected double getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
throws Exception
getNewWeights in class AbstractMixtureTrainSMdataWeights - the weights for the internal data set (should not be changed)w - the array for the statistic of the component parameters (shall
be filled)seqweights - an array containing for each component the weights for each
sequence (shall be filled)Exception - if something went wrongprotected void setTrainData(DataSet data)
AbstractMixtureTrainSMtrain-method and sets for a
given data set the data set that should be used for train.setTrainData in class AbstractMixtureTrainSMdata - the given data set of sequences