public class StrandTrainSM extends AbstractMixtureTrainSM
TrainableStatisticalModel
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights
alphabets, length
Modifier | Constructor and Description |
---|---|
|
StrandTrainSM(StringBuffer stringBuff)
The constructor for the interface
Storable . |
protected |
StrandTrainSM(TrainableStatisticalModel model,
int starts,
boolean estimateComponentProbs,
double[] componentHyperParams,
double forwardStrandProb,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new
StrandTrainSM . |
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double[] componentHyperParams,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and estimating the component probabilities.
|
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double[] componentHyperParams,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and sampling the component
probabilities.
|
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double forwardStrandProb,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and fixed component probabilities.
|
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double forwardStrandProb,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and fixed component
probabilities.
|
Modifier and Type | Method and Description |
---|---|
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current
model on the internal data set.
|
protected Sequence[] |
emitDataSetUsingCurrentParameterSet(int n,
int... lengths)
The method returns an array of sequences using the current parameter set.
|
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence s,
int start,
int end)
Returns the logarithmic probability for the sequence and the given
component using the current parameter set.
|
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score.
|
void |
setTrainData(DataSet s)
This method is invoked by the
train -method and sets for a
given data set the data set that should be used for train . |
String |
toString(NumberFormat nf)
This method returns a
String representation of the instance. |
algorithmHasBeenRun, checkLength, checkModelsForGibbsSampling, clone, continueIterations, continueIterations, createSeqWeightsArray, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, extractFurtherInformation, finalize, fromXML, getCharacteristics, getFurtherInformation, getIndexOfMaximalComponentFor, getInstanceName, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParameters, getNewParametersForModel, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setWeights, swap, toXML, train
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, train
protected StrandTrainSM(TrainableStatisticalModel model, int starts, boolean estimateComponentProbs, double[] componentHyperParams, double forwardStrandProb, AbstractMixtureTrainSM.Algorithm algorithm, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
StrandTrainSM
. This constructor can be used for any
algorithm since it takes all necessary values as parameters.model
- the model building the basis of the StrandTrainSM
, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
model has to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1estimateComponentProbs
- the switch for estimating the component probabilities in the
algorithm or to hold them fixed; if the component parameters
are fixed, the value forwardStrandProb
will be
used, otherwise the componentHyperParams
will be
incorporated in the adjustmentcomponentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length 2
null
or an array with all values zero (0)
then ML
parameterization
forwardStrandProb
- the probability for the forward strandalgorithm
- either AbstractMixtureTrainSM.Algorithm.EM
or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
alpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabetCloneNotSupportedException
- if the models
can not be clonedpublic StrandTrainSM(TrainableStatisticalModel model, int starts, double[] componentHyperParams, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
model
- the model building the basis of the StrandTrainSM
, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
model has to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length 2
null
or an array with all values zero (0)
then ML
parameterization
alpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabetCloneNotSupportedException
- if the models
can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest )
,
AbstractMixtureTrainSM.Algorithm.EM
public StrandTrainSM(TrainableStatisticalModel model, int starts, double forwardStrandProb, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
model
- the model building the basis of the StrandTrainSM
, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
model has to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1forwardStrandProb
- the probability for the forward strandalpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabetCloneNotSupportedException
- if the models
can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest )
,
AbstractMixtureTrainSM.Algorithm.EM
public StrandTrainSM(TrainableStatisticalModel model, int starts, double[] componentHyperParams, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
model
- the model building the basis of the StrandTrainSM
, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
model has to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length 2
null
or an array with all values zero (0)
then ML
parameterization
initialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabetCloneNotSupportedException
- if the models
can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest )
,
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
public StrandTrainSM(TrainableStatisticalModel model, int starts, double forwardStrandProb, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
model
- the model building the basis of the StrandTrainSM
, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
model has to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1forwardStrandProb
- the probability for the forward strandinitialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabetCloneNotSupportedException
- if the models
can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest )
,
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
public StrandTrainSM(StringBuffer stringBuff) throws NonParsableException
Storable
. Creates a
new StrandTrainSM
out of its XML representation.stringBuff
- the StringBuffer
containing the XML representation of
the modelNonParsableException
- if the StringBuffer
could not be parsedpublic void setTrainData(DataSet s) throws Exception
AbstractMixtureTrainSM
train
-method and sets for a
given data set the data set that should be used for train
.setTrainData
in class AbstractMixtureTrainSM
s
- the given data set of sequencesException
- if something went wrongprotected double[][] doFirstIteration(double[] dataWeights, MultivariateRandomGenerator m, MRGParams[] params) throws Exception
AbstractMixtureTrainSM
doFirstIteration
in class AbstractMixtureTrainSM
dataWeights
- null
or the weights of each element of the data setm
- the multivariate random generatorparams
- the parameters for the multivariate random generatorException
- if something went wrongprotected double getNewWeights(double[] dataWeights, double[] w, double[][] seqweights) throws Exception
getNewWeights
in class AbstractMixtureTrainSM
dataWeights
- the weights for the internal data set (should not be changed)w
- the array for the statistic of the component parameters (shall
be filled)seqweights
- an array containing for each component the weights for each
sequence (shall be filled)Exception
- if something went wrongpublic String toString(NumberFormat nf)
SequenceScore
String
representation of the instance.nf
- the NumberFormat
for the String
representation of parameters or probabilitiesString
representation of the instanceprotected Sequence[] emitDataSetUsingCurrentParameterSet(int n, int... lengths) throws NotTrainedException, Exception
AbstractMixtureTrainSM
emitDataSetUsingCurrentParameterSet
in class AbstractMixtureTrainSM
n
- the number of sequences to be sampledlengths
- the corresponding lengthsException
- if it was impossible to sample the sequencesNotTrainedException
StatisticalModel.emitDataSet(int, int...)
protected double getLogProbUsingCurrentParameterSetFor(int component, Sequence s, int start, int end) throws Exception
AbstractMixtureTrainSM
getLogProbUsingCurrentParameterSetFor
in class AbstractMixtureTrainSM
component
- the index of the components
- the sequencestart
- the start position in the sequenceend
- the end position in the sequencelog P(s,component) = log P(s|component) + log P(component)
Exception
- if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()