|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.StrandTrainSM
public class StrandTrainSM
This model handles sequences that can either lie on the forward strand or on
the reverse complementary strand. Therefore it is recommended to use this model only for
DNA, but it is not restricted to DNA.
If you use Gibbs Sampling temporary files will be created in the Java temp
folder. These files will be deleted if no reference to the current instance
exists and the Garbage Collector is called. Therefore it is recommended to
call the Garbage Collector explicitly at the end of any application.
TrainableStatisticalModel| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
|---|
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization |
| Field Summary |
|---|
| Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
|---|
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights |
| Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
|---|
alphabets, length |
| Constructor Summary | |
|---|---|
|
StrandTrainSM(StringBuffer stringBuff)
The constructor for the interface Storable. |
protected |
StrandTrainSM(TrainableStatisticalModel model,
int starts,
boolean estimateComponentProbs,
double[] componentHyperParams,
double forwardStrandProb,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new StrandTrainSM. |
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double[] componentHyperParams,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and estimating the component probabilities. |
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double[] componentHyperParams,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and sampling the component probabilities. |
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double forwardStrandProb,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and fixed component probabilities. |
|
StrandTrainSM(TrainableStatisticalModel model,
int starts,
double forwardStrandProb,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and fixed component probabilities. |
| Method Summary | |
|---|---|
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current model on the internal sample. |
protected Sequence[] |
emitDataSetUsingCurrentParameterSet(int n,
int... lengths)
The method returns an array of sequences using the current parameter set. |
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence s,
int start,
int end)
Returns the logarithmic probability for the sequence and the given component using the current parameter set. |
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score. |
void |
setTrainData(DataSet s)
This method is invoked by the train-method and sets for a
given sample the sample that should be used for train. |
String |
toString()
Should give a simple representation (text) of the model as String. |
| Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
|---|
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, train |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
protected StrandTrainSM(TrainableStatisticalModel model,
int starts,
boolean estimateComponentProbs,
double[] componentHyperParams,
double forwardStrandProb,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
StrandTrainSM. This constructor can be used for any
algorithm since it takes all necessary values as parameters.
model - the model building the basis of the StrandTrainSM, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
model has to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1estimateComponentProbs - the switch for estimating the component probabilities in the
algorithm or to hold them fixed; if the component parameters
are fixed, the value forwardStrandProb will be
used, otherwise the componentHyperParams will be
incorporated in the adjustmentcomponentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length 2
null or an array with all values zero (0)
then ML
parameterization
forwardStrandProb - the probability for the forward strandalgorithm - either AbstractMixtureTrainSM.Algorithm.EM or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGalpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGIllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be cloned
public StrandTrainSM(TrainableStatisticalModel model,
int starts,
double[] componentHyperParams,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
model - the model building the basis of the StrandTrainSM, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
model has to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length 2
null or an array with all values zero (0)
then ML
parameterization
alpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest ),
AbstractMixtureTrainSM.Algorithm.EM
public StrandTrainSM(TrainableStatisticalModel model,
int starts,
double forwardStrandProb,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
model - the model building the basis of the StrandTrainSM, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
model has to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1forwardStrandProb - the probability for the forward strandalpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest ),
AbstractMixtureTrainSM.Algorithm.EM
public StrandTrainSM(TrainableStatisticalModel model,
int starts,
double[] componentHyperParams,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
model - the model building the basis of the StrandTrainSM, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
model has to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length 2
null or an array with all values zero (0)
then ML
parameterization
initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGIllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest ),
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
public StrandTrainSM(TrainableStatisticalModel model,
int starts,
double forwardStrandProb,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
model - the model building the basis of the StrandTrainSM, if
the instance is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
model has to implement
SamplingComponentstarts - the number of times the algorithm will be started in the
train-method, at least 1forwardStrandProb - the probability for the forward strandinitialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGIllegalArgumentException - if
length
dimension < 1
weights != null && weights.length != dimension
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be clonedStrandTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, int, boolean,
double[], double,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest ),
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
public StrandTrainSM(StringBuffer stringBuff)
throws NonParsableException
Storable. Creates a
new StrandTrainSM out of its XML representation.
stringBuff - the StringBuffer containing the XML representation of
the model
NonParsableException - if the StringBuffer could not be parsed| Method Detail |
|---|
public void setTrainData(DataSet s)
throws Exception
AbstractMixtureTrainSMtrain-method and sets for a
given sample the sample that should be used for train.
setTrainData in class AbstractMixtureTrainSMs - the given sample of sequences
Exception - if something went wrong
protected double[][] doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
AbstractMixtureTrainSM
doFirstIteration in class AbstractMixtureTrainSMdataWeights - null or the weights of each element of the samplem - the multivariate random generatorparams - the parameters for the multivariate random generator
Exception - if something went wrong
protected double getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
throws Exception
getNewWeights in class AbstractMixtureTrainSMdataWeights - the weights for the internal sample (should not be changed)w - the array for the statistic of the component parameters (shall
be filled)seqweights - an array containing for each component the weights for each
sequence (shall be filled)
Exception - if something went wrongpublic String toString()
TrainableStatisticalModelString.
toString in interface TrainableStatisticalModeltoString in class ObjectString
protected Sequence[] emitDataSetUsingCurrentParameterSet(int n,
int... lengths)
throws NotTrainedException,
Exception
AbstractMixtureTrainSM
emitDataSetUsingCurrentParameterSet in class AbstractMixtureTrainSMn - the number of sequences to be sampledlengths - the corresponding lengths
Exception - if it was impossible to sample the sequences
NotTrainedExceptionStatisticalModel.emitDataSet(int, int...)
protected double getLogProbUsingCurrentParameterSetFor(int component,
Sequence s,
int start,
int end)
throws Exception
AbstractMixtureTrainSM
getLogProbUsingCurrentParameterSetFor in class AbstractMixtureTrainSMcomponent - the index of the components - the sequencestart - the start position in the sequenceend - the end position in the sequence
log P(s,component) = log P(s|component) + log P(component)
Exception - if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||