public class ZOOPSTrainSM extends HiddenMotifMixture
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization
MotifDiscoverer.KindOfProfile
Modifier and Type | Field and Description |
---|---|
protected byte |
bgMaxMarkovOrder
The order of the background model.
|
posPrior
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights
alphabets, length
Modifier | Constructor and Description |
---|---|
|
ZOOPSTrainSM(StringBuffer xml)
The standard constructor for the interface
Storable . |
protected |
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
double[] weights,
PositionPrior posPrior,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new
ZOOPSTrainSM . |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new
ZOOPSTrainSM using EM and estimating
the probability for finding a motif. |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double motifProb,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new
ZOOPSTrainSM using EM and fixed
probability for finding a motif. |
Modifier and Type | Method and Description |
---|---|
protected double[][] |
createSeqWeightsArray()
Creates an array that can be used for weighting sequences in the
algorithm.
|
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current
model on the internal data set.
|
int |
getGlobalIndexOfMotifInComponent(int component,
int motif)
Returns the global index of the
motif used in
component . |
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
Returns the logarithmic probability for the sequence and the given
component using the current parameter set.
|
int |
getMinimalSequenceLength()
Returns the minimal length a sequence respectively a data set has to have.
|
int |
getMotifLength(int motif)
This method returns the length of the motif with index
motif
. |
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score.
|
int |
getNumberOfMotifs()
Returns the number of motifs for this
MotifDiscoverer . |
int |
getNumberOfMotifsInComponent(int component)
Returns the number of motifs that are used in the component
component of this MotifDiscoverer . |
double[] |
getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
Returns the profile of the scores for component
component
and motif motif at all possible start positions of the motif
in the sequence sequence beginning at startpos . |
double[] |
getStrandProbabilitiesFor(int component,
int motif,
Sequence sequence,
int startpos)
This method returns the probabilities of the strand orientations for a given subsequence if it is
considered as site of the motif model in a specific component.
|
protected double |
iterate(int start,
double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method runs the train algorithm for the current model and the
internal data set.
|
protected double |
modify(double[] containsMotif,
double[] startpos,
int start,
int end)
This method modifies the computed weights for one sequence and returns
the score.
|
void |
setShiftCorrection(boolean correct)
Enables or disables the phase shift correction.
|
protected void |
setTrainData(DataSet data)
This method is invoked by the
train -method and sets for a
given data set the data set that should be used for train . |
void |
trainBgModel(DataSet data,
double[] weights)
This method trains the background model.
|
checkLength, clone, emitDataSetUsingCurrentParameterSet, extractFurtherInformation, getFurtherInformation, getInstanceName, getNewParameters, toString, train
algorithmHasBeenRun, checkModelsForGibbsSampling, continueIterations, continueIterations, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, finalize, fromXML, getCharacteristics, getIndexOfMaximalComponentFor, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParametersForModel, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setWeights, swap, toXML
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, train
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
getIndexOfMaximalComponentFor, getNumberOfComponents
protected ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double[] componentHyperParams, double[] weights, PositionPrior posPrior, AbstractMixtureTrainSM.Algorithm algorithm, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM
. This constructor can be
used for any algorithm since it takes all necessary values as parameters.motif
- the motif model, if the model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
model has to implement
SamplingComponent
.bg
- the background model for the flanking sequences and for those
sequences that do not contain a binding site, if
trainOnlyMotifModel == false
and
algorithm == AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the model has to implement
SamplingComponent
.
The model has to be able to score sequences of arbitrary
length.trainOnlyMotifModel
- a switch whether to train only the motif modelstarts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length dimension
null
or an array with all values zero (0)
than ML
parameterization
weights
- null
or the weights for the components (then
weights.length == dimension
)posPrior
- this object determines the positional distribution that shall
be usedalgorithm
- either AbstractMixtureTrainSM.Algorithm.EM
or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
alpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
CloneNotSupportedException
- if
weights != null && weights.length != 2
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
IllegalArgumentException
- if not all models
work on the same simple
alphabetWrongAlphabetException
- if the models
can not be clonedpublic ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double[] componentHyperParams, PositionPrior posPrior, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM
using EM and estimating
the probability for finding a motif.motif
- the motif modelbg
- the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length dimension
null
or an array with all values zero (0)
then ML
parameterization
posPrior
- this object determines the positional distribution that shall
be usedtrainOnlyMotifModel
- a switch whether to train only the motif modelalpha
- the positive parameter for the Dirichlet distribution which is
used when you invoke train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same simple
alphabetCloneNotSupportedException
- if the models
can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.EM
public ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double motifProb, PositionPrior posPrior, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM
using EM and fixed
probability for finding a motif.motif
- the motif modelbg
- the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts
- the number of times the algorithm will be started in the
train
-method, at least 1motifProb
- the probability of finding a motif in a sequence (in [0,1])posPrior
- this object determines the positional distribution that shall
be usedtrainOnlyMotifModel
- a switch whether to train only the motif modelalpha
- the positive parameter for the Dirichlet distribution which is
used when you invoke train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
motifProb < 0
or
motifProb > 1
starts < 1
WrongAlphabetException
- if not all models
work on the same simple
alphabetCloneNotSupportedException
- if the models
can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.EM
public ZOOPSTrainSM(StringBuffer xml) throws NonParsableException
Storable
.
Creates a new ZOOPSTrainSM
out of its XML
representation.xml
- the XML representation of the model as a StringBuffer
NonParsableException
- if the StringBuffer
can not be parsedprotected void setTrainData(DataSet data) throws Exception
AbstractMixtureTrainSM
train
-method and sets for a
given data set the data set that should be used for train
.setTrainData
in class AbstractMixtureTrainSM
data
- the given data set of sequencesException
- if something went wrongprotected double[][] createSeqWeightsArray()
AbstractMixtureTrainSM
createSeqWeightsArray
in class AbstractMixtureTrainSM
protected double[][] doFirstIteration(double[] dataWeights, MultivariateRandomGenerator m, MRGParams[] params) throws Exception
AbstractMixtureTrainSM
doFirstIteration
in class AbstractMixtureTrainSM
dataWeights
- null
or the weights of each element of the data setm
- the multivariate random generatorparams
- the parameters for the multivariate random generatorException
- if something went wrongprotected double getNewWeights(double[] dataWeights, double[] w, double[][] seqweights) throws Exception
AbstractMixtureTrainSM
getNewWeights
in class AbstractMixtureTrainSM
dataWeights
- the weights for the internal data set (should not be changed)w
- the array for the statistic of the component parameters (shall
be filled)seqweights
- an array containing for each component the weights for each
sequence (shall be filled)Exception
- if something went wrongprotected double modify(double[] containsMotif, double[] startpos, int start, int end)
containsMotif
- an array to return the weights for containing a motif (index
0) or containing no motif (index 1)startpos
- the array containing the scores for each start position
(including no motif in the sequence)start
- the start indexend
- the end indexprotected double getLogProbUsingCurrentParameterSetFor(int component, Sequence seq, int start, int end) throws Exception
AbstractMixtureTrainSM
getLogProbUsingCurrentParameterSetFor
in class AbstractMixtureTrainSM
component
- the index of the componentseq
- the sequencestart
- the start position in the sequenceend
- the end position in the sequencelog P(s,component) = log P(s|component) + log P(component)
Exception
- if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()
public double[] getProfileOfScoresFor(int component, int motif, Sequence sequence, int startpos, MotifDiscoverer.KindOfProfile kind) throws Exception
MotifDiscoverer
component
and motif motif
at all possible start positions of the motif
in the sequence sequence
beginning at startpos
.
This array should be of length sequence.length() - startpos - motifs[motif].getLength() + 1
.
component
- the component indexmotif
- the index of the motif in the componentsequence
- the given sequencestartpos
- the start position in the sequencekind
- indicates the kind of profileException
- if the score could not be computed for any reasonspublic int getMinimalSequenceLength()
HiddenMotifMixture
getMinimalSequenceLength
in class HiddenMotifMixture
public int getMotifLength(int motif)
MotifDiscoverer
motif
.motif
- the index of the motifmotif
public int getNumberOfMotifs()
MotifDiscoverer
MotifDiscoverer
.public int getNumberOfMotifsInComponent(int component)
MotifDiscoverer
component
of this MotifDiscoverer
.component
- the component of the MotifDiscoverer
public double[] getStrandProbabilitiesFor(int component, int motif, Sequence sequence, int startpos) throws Exception
MotifDiscoverer
component
- the component indexmotif
- the index of the motif in the componentsequence
- the given sequencestartpos
- the start position in the sequenceException
- if the strand could not be computed for any reasonspublic int getGlobalIndexOfMotifInComponent(int component, int motif)
MotifDiscoverer
motif
used in
component
. The index returned must be at least 0 and less
than MotifDiscoverer.getNumberOfMotifs()
.component
- the component indexmotif
- the motif index in the componentmotif
in component
public void trainBgModel(DataSet data, double[] weights) throws Exception
HiddenMotifMixture
trainBgModel
in class HiddenMotifMixture
data
- the data setweights
- the weightsException
- if something went wrongprotected double iterate(int start, double[] dataWeights, MultivariateRandomGenerator m, MRGParams[] params) throws Exception
AbstractMixtureTrainSM
iterate
in class AbstractMixtureTrainSM
start
- the index of the trainingdataWeights
- the weights for each sequence or null
m
- the random generator for initiating the algorithmparams
- the parameters for the sequencesException
- if something went wrongAbstractMixtureTrainSM.doFirstIteration(DataSet, double[],
MultivariateRandomGenerator, MRGParams[])
,
AbstractMixtureTrainSM.continueIterations(double[], double[][])
,
AbstractMixtureTrainSM.continueIterations(double[], double[][], int,
int)
public void setShiftCorrection(boolean correct)
correct
- switch that determines whether to correct shifts or not