public class ZOOPSTrainSM extends HiddenMotifMixture
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.ParameterizationMotifDiscoverer.KindOfProfile| Modifier and Type | Field and Description |
|---|---|
protected byte |
bgMaxMarkovOrder
The order of the background model.
|
posPrioralgorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weightsalphabets, length| Modifier | Constructor and Description |
|---|---|
|
ZOOPSTrainSM(StringBuffer xml)
The standard constructor for the interface
Storable. |
protected |
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
double[] weights,
PositionPrior posPrior,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new
ZOOPSTrainSM. |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new
ZOOPSTrainSM using EM and estimating
the probability for finding a motif. |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double motifProb,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new
ZOOPSTrainSM using EM and fixed
probability for finding a motif. |
| Modifier and Type | Method and Description |
|---|---|
protected double[][] |
createSeqWeightsArray()
Creates an array that can be used for weighting sequences in the
algorithm.
|
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current
model on the internal data set.
|
int |
getGlobalIndexOfMotifInComponent(int component,
int motif)
Returns the global index of the
motif used in
component. |
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
Returns the logarithmic probability for the sequence and the given
component using the current parameter set.
|
int |
getMinimalSequenceLength()
Returns the minimal length a sequence respectively a data set has to have.
|
int |
getMotifLength(int motif)
This method returns the length of the motif with index
motif
. |
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score.
|
int |
getNumberOfMotifs()
Returns the number of motifs for this
MotifDiscoverer. |
int |
getNumberOfMotifsInComponent(int component)
Returns the number of motifs that are used in the component
component of this MotifDiscoverer. |
double[] |
getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
Returns the profile of the scores for component
component
and motif motif at all possible start positions of the motif
in the sequence sequence beginning at startpos. |
double[] |
getStrandProbabilitiesFor(int component,
int motif,
Sequence sequence,
int startpos)
This method returns the probabilities of the strand orientations for a given subsequence if it is
considered as site of the motif model in a specific component.
|
protected double |
iterate(int start,
double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method runs the train algorithm for the current model and the
internal data set.
|
protected double |
modify(double[] containsMotif,
double[] startpos,
int start,
int end)
This method modifies the computed weights for one sequence and returns
the score.
|
void |
setShiftCorrection(boolean correct)
Enables or disables the phase shift correction.
|
protected void |
setTrainData(DataSet data)
This method is invoked by the
train-method and sets for a
given data set the data set that should be used for train. |
void |
trainBgModel(DataSet data,
double[] weights)
This method trains the background model.
|
checkLength, clone, emitDataSetUsingCurrentParameterSet, extractFurtherInformation, getFurtherInformation, getInstanceName, getNewParameters, toString, trainalgorithmHasBeenRun, checkModelsForGibbsSampling, continueIterations, continueIterations, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, finalize, fromXML, getCharacteristics, getIndexOfMaximalComponentFor, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParametersForModel, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setWeights, swap, toXMLcheck, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, trainequals, getClass, hashCode, notify, notifyAll, wait, wait, waitgetIndexOfMaximalComponentFor, getNumberOfComponentsprotected ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double[] componentHyperParams, double[] weights, PositionPrior posPrior, AbstractMixtureTrainSM.Algorithm algorithm, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM. This constructor can be
used for any algorithm since it takes all necessary values as parameters.motif - the motif model, if the model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
model has to implement
SamplingComponent.bg - the background model for the flanking sequences and for those
sequences that do not contain a binding site, if
trainOnlyMotifModel == false and
algorithm == AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the model has to implement
SamplingComponent.
The model has to be able to score sequences of arbitrary
length.trainOnlyMotifModel - a switch whether to train only the motif modelstarts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length dimension
null or an array with all values zero (0)
than ML
parameterization
weights - null or the weights for the components (then
weights.length == dimension)posPrior - this object determines the positional distribution that shall
be usedalgorithm - either AbstractMixtureTrainSM.Algorithm.EM or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGalpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGCloneNotSupportedException - if
weights != null && weights.length != 2
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
IllegalArgumentException - if not all models work on the same simple
alphabetWrongAlphabetException - if the models can not be clonedpublic ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double[] componentHyperParams, PositionPrior posPrior, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM using EM and estimating
the probability for finding a motif.motif - the motif modelbg - the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length dimension
null or an array with all values zero (0)
then ML
parameterization
posPrior - this object determines the positional distribution that shall
be usedtrainOnlyMotifModel - a switch whether to train only the motif modelalpha - the positive parameter for the Dirichlet distribution which is
used when you invoke train to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same simple
alphabetCloneNotSupportedException - if the models can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.EMpublic ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double motifProb, PositionPrior posPrior, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM using EM and fixed
probability for finding a motif.motif - the motif modelbg - the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts - the number of times the algorithm will be started in the
train-method, at least 1motifProb - the probability of finding a motif in a sequence (in [0,1])posPrior - this object determines the positional distribution that shall
be usedtrainOnlyMotifModel - a switch whether to train only the motif modelalpha - the positive parameter for the Dirichlet distribution which is
used when you invoke train to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
motifProb < 0 or
motifProb > 1
starts < 1
WrongAlphabetException - if not all models work on the same simple
alphabetCloneNotSupportedException - if the models can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.EMpublic ZOOPSTrainSM(StringBuffer xml) throws NonParsableException
Storable.
Creates a new ZOOPSTrainSM out of its XML
representation.xml - the XML representation of the model as a StringBufferNonParsableException - if the StringBuffer can not be parsedprotected void setTrainData(DataSet data) throws Exception
AbstractMixtureTrainSMtrain-method and sets for a
given data set the data set that should be used for train.setTrainData in class AbstractMixtureTrainSMdata - the given data set of sequencesException - if something went wrongprotected double[][] createSeqWeightsArray()
AbstractMixtureTrainSMcreateSeqWeightsArray in class AbstractMixtureTrainSMprotected double[][] doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
AbstractMixtureTrainSMdoFirstIteration in class AbstractMixtureTrainSMdataWeights - null or the weights of each element of the data setm - the multivariate random generatorparams - the parameters for the multivariate random generatorException - if something went wrongprotected double getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
throws Exception
AbstractMixtureTrainSMgetNewWeights in class AbstractMixtureTrainSMdataWeights - the weights for the internal data set (should not be changed)w - the array for the statistic of the component parameters (shall
be filled)seqweights - an array containing for each component the weights for each
sequence (shall be filled)Exception - if something went wrongprotected double modify(double[] containsMotif,
double[] startpos,
int start,
int end)
containsMotif - an array to return the weights for containing a motif (index
0) or containing no motif (index 1)startpos - the array containing the scores for each start position
(including no motif in the sequence)start - the start indexend - the end indexprotected double getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
throws Exception
AbstractMixtureTrainSMgetLogProbUsingCurrentParameterSetFor in class AbstractMixtureTrainSMcomponent - the index of the componentseq - the sequencestart - the start position in the sequenceend - the end position in the sequencelog P(s,component) = log P(s|component) + log P(component)Exception - if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()public double[] getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
throws Exception
MotifDiscoverercomponent
and motif motif at all possible start positions of the motif
in the sequence sequence beginning at startpos.
This array should be of length sequence.length() - startpos - motifs[motif].getLength() + 1.
component - the component indexmotif - the index of the motif in the componentsequence - the given sequencestartpos - the start position in the sequencekind - indicates the kind of profileException - if the score could not be computed for any reasonspublic int getMinimalSequenceLength()
HiddenMotifMixturegetMinimalSequenceLength in class HiddenMotifMixturepublic int getMotifLength(int motif)
MotifDiscoverermotif
.motif - the index of the motifmotifpublic int getNumberOfMotifs()
MotifDiscovererMotifDiscoverer.public int getNumberOfMotifsInComponent(int component)
MotifDiscoverercomponent of this MotifDiscoverer.component - the component of the MotifDiscovererpublic double[] getStrandProbabilitiesFor(int component,
int motif,
Sequence sequence,
int startpos)
throws Exception
MotifDiscoverercomponent - the component indexmotif - the index of the motif in the componentsequence - the given sequencestartpos - the start position in the sequenceException - if the strand could not be computed for any reasonspublic int getGlobalIndexOfMotifInComponent(int component,
int motif)
MotifDiscoverermotif used in
component. The index returned must be at least 0 and less
than MotifDiscoverer.getNumberOfMotifs().component - the component indexmotif - the motif index in the componentmotif in componentpublic void trainBgModel(DataSet data, double[] weights) throws Exception
HiddenMotifMixturetrainBgModel in class HiddenMotifMixturedata - the data setweights - the weightsException - if something went wrongprotected double iterate(int start,
double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
AbstractMixtureTrainSMiterate in class AbstractMixtureTrainSMstart - the index of the trainingdataWeights - the weights for each sequence or nullm - the random generator for initiating the algorithmparams - the parameters for the sequencesException - if something went wrongAbstractMixtureTrainSM.doFirstIteration(DataSet, double[],
MultivariateRandomGenerator, MRGParams[]),
AbstractMixtureTrainSM.continueIterations(double[], double[][]),
AbstractMixtureTrainSM.continueIterations(double[], double[][], int,
int)public void setShiftCorrection(boolean correct)
correct - switch that determines whether to correct shifts or not