|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.HiddenMotifMixture
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.ZOOPSTrainSM
public class ZOOPSTrainSM
This class enables the user to search for a single motif in a sequence. The
user is enabled to train the model either "one occurrence per
sequence" (=OOPS) or "zero or one occurrence per sequence"
(=ZOOPS).
If EM is used for training the parameters are trained in a MEME-like manner.
Currently only EM is implemented.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
|---|
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization |
| Nested classes/interfaces inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer |
|---|
MotifDiscoverer.KindOfProfile |
| Field Summary | |
|---|---|
protected byte |
bgMaxMarkovOrder
The order of the background model. |
| Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.HiddenMotifMixture |
|---|
posPrior |
| Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
|---|
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights |
| Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
|---|
alphabets, length |
| Constructor Summary | |
|---|---|
|
ZOOPSTrainSM(StringBuffer xml)
The standard constructor for the interface Storable. |
protected |
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
double[] weights,
PositionPrior posPrior,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new ZOOPSTrainSM. |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new ZOOPSTrainSM using EM and estimating
the probability for finding a motif. |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double motifProb,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new ZOOPSTrainSM using EM and fixed
probability for finding a motif. |
| Method Summary | |
|---|---|
protected double[][] |
createSeqWeightsArray()
Creates an array that can be used for weighting sequences in the algorithm. |
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current model on the internal data set. |
int |
getGlobalIndexOfMotifInComponent(int component,
int motif)
Returns the global index of the motif used in
component. |
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
Returns the logarithmic probability for the sequence and the given component using the current parameter set. |
int |
getMinimalSequenceLength()
Returns the minimal length a sequence respectively a data set has to have. |
int |
getMotifLength(int motif)
This method returns the length of the motif with index motif
. |
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score. |
int |
getNumberOfMotifs()
Returns the number of motifs for this MotifDiscoverer. |
int |
getNumberOfMotifsInComponent(int component)
Returns the number of motifs that are used in the component component of this MotifDiscoverer. |
double[] |
getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
Returns the profile of the scores for component component
and motif motif at all possible start positions of the motif
in the sequence sequence beginning at startpos. |
double[] |
getStrandProbabilitiesFor(int component,
int motif,
Sequence sequence,
int startpos)
This method returns the probabilities of the strand orientations for a given subsequence if it is considered as site of the motif model in a specific component. |
protected double |
iterate(int start,
double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method runs the train algorithm for the current model and the internal data set. |
protected double |
modify(double[] containsMotif,
double[] startpos,
int start,
int end)
This method modifies the computed weights for one sequence and returns the score. |
void |
setShiftCorrection(boolean correct)
Enables or disables the phase shift correction. |
protected void |
setTrainData(DataSet data)
This method is invoked by the train-method and sets for a
given data set the data set that should be used for train. |
void |
trainBgModel(DataSet data,
double[] weights)
This method trains the background model. |
| Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.HiddenMotifMixture |
|---|
checkLength, clone, emitDataSetUsingCurrentParameterSet, extractFurtherInformation, getFurtherInformation, getInstanceName, getNewParameters, toString, train |
| Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
|---|
algorithmHasBeenRun, checkModelsForGibbsSampling, continueIterations, continueIterations, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, finalize, fromXML, getCharacteristics, getIndexOfMaximalComponentFor, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParametersForModel, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setWeights, swap, toXML |
| Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
|---|
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, train |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer |
|---|
getIndexOfMaximalComponentFor, getNumberOfComponents |
| Methods inherited from interface de.jstacs.Storable |
|---|
toXML |
| Field Detail |
|---|
protected byte bgMaxMarkovOrder
| Constructor Detail |
|---|
protected ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
double[] weights,
PositionPrior posPrior,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
ZOOPSTrainSM. This constructor can be
used for any algorithm since it takes all necessary values as parameters.
motif - the motif model, if the model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING the
model has to implement
SamplingComponent.bg - the background model for the flanking sequences and for those
sequences that do not contain a binding site, if
trainOnlyMotifModel == false and
algorithm == AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the model has to implement
SamplingComponent.
The model has to be able to score sequences of arbitrary
length.trainOnlyMotifModel - a switch whether to train only the motif modelstarts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length dimension
null or an array with all values zero (0)
than ML
parameterization
weights - null or the weights for the components (then
weights.length == dimension)posPrior - this object determines the positional distribution that shall
be usedalgorithm - either AbstractMixtureTrainSM.Algorithm.EM or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGalpha - only for AbstractMixtureTrainSM.Algorithm.EMtrain to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - only for AbstractMixtureTrainSM.Algorithm.EMAbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLINGCloneNotSupportedException - if
weights != null && weights.length != 2
weights != null and it exists an
i where weights[i] < 0
starts < 1
componentHyperParams are not correct
IllegalArgumentException - if not all models work on the same simple
alphabet
WrongAlphabetException - if the models can not be cloned
public ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
ZOOPSTrainSM using EM and estimating
the probability for finding a motif.
motif - the motif modelbg - the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts - the number of times the algorithm will be started in the
train-method, at least 1componentHyperParams - the hyperparameters for the component assignment prior
estimateComponentProbs == true
null or has to have
length dimension
null or an array with all values zero (0)
then ML
parameterization
posPrior - this object determines the positional distribution that shall
be usedtrainOnlyMotifModel - a switch whether to train only the motif modelalpha - the positive parameter for the Dirichlet distribution which is
used when you invoke train to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
starts < 1
componentHyperParams are not correct
WrongAlphabetException - if not all models work on the same simple
alphabet
CloneNotSupportedException - if the models can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.EM
public ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double motifProb,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
ZOOPSTrainSM using EM and fixed
probability for finding a motif.
motif - the motif modelbg - the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts - the number of times the algorithm will be started in the
train-method, at least 1motifProb - the probability of finding a motif in a sequence (in [0,1])posPrior - this object determines the positional distribution that shall
be usedtrainOnlyMotifModel - a switch whether to train only the motif modelalpha - the positive parameter for the Dirichlet distribution which is
used when you invoke train to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc - only for AbstractMixtureTrainSM.Algorithm.EMTerminationCondition for stopping the EM-algorithm,
tc has to return true from TerminationCondition.isSimple()parametrization - the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException - if
motifProb < 0 or
motifProb > 1
starts < 1
WrongAlphabetException - if not all models work on the same simple
alphabet
CloneNotSupportedException - if the models can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest),
AbstractMixtureTrainSM.Algorithm.EM
public ZOOPSTrainSM(StringBuffer xml)
throws NonParsableException
Storable.
Creates a new ZOOPSTrainSM out of its XML
representation.
xml - the XML representation of the model as a StringBuffer
NonParsableException - if the StringBuffer can not be parsed| Method Detail |
|---|
protected void setTrainData(DataSet data)
throws Exception
AbstractMixtureTrainSMtrain-method and sets for a
given data set the data set that should be used for train.
setTrainData in class AbstractMixtureTrainSMdata - the given data set of sequences
Exception - if something went wrongprotected double[][] createSeqWeightsArray()
AbstractMixtureTrainSM
createSeqWeightsArray in class AbstractMixtureTrainSM
protected double[][] doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
AbstractMixtureTrainSM
doFirstIteration in class AbstractMixtureTrainSMdataWeights - null or the weights of each element of the data setm - the multivariate random generatorparams - the parameters for the multivariate random generator
Exception - if something went wrong
protected double getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
throws Exception
AbstractMixtureTrainSM
getNewWeights in class AbstractMixtureTrainSMdataWeights - the weights for the internal data set (should not be changed)w - the array for the statistic of the component parameters (shall
be filled)seqweights - an array containing for each component the weights for each
sequence (shall be filled)
Exception - if something went wrong
protected double modify(double[] containsMotif,
double[] startpos,
int start,
int end)
containsMotif - an array to return the weights for containing a motif (index
0) or containing no motif (index 1)startpos - the array containing the scores for each start position
(including no motif in the sequence)start - the start indexend - the end index
protected double getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
throws Exception
AbstractMixtureTrainSM
getLogProbUsingCurrentParameterSetFor in class AbstractMixtureTrainSMcomponent - the index of the componentseq - the sequencestart - the start position in the sequenceend - the end position in the sequence
log P(s,component) = log P(s|component) + log P(component)
Exception - if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()
public double[] getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
throws Exception
MotifDiscoverercomponent
and motif motif at all possible start positions of the motif
in the sequence sequence beginning at startpos.
This array should be of length sequence.length() - startpos - motifs[motif].getLength() + 1.
component - the component indexmotif - the index of the motif in the componentsequence - the given sequencestartpos - the start position in the sequencekind - indicates the kind of profile
Exception - if the score could not be computed for any reasonspublic int getMinimalSequenceLength()
HiddenMotifMixture
getMinimalSequenceLength in class HiddenMotifMixturepublic int getMotifLength(int motif)
MotifDiscoverermotif
.
motif - the index of the motif
motifpublic int getNumberOfMotifs()
MotifDiscovererMotifDiscoverer.
public int getNumberOfMotifsInComponent(int component)
MotifDiscoverercomponent of this MotifDiscoverer.
component - the component of the MotifDiscoverer
public double[] getStrandProbabilitiesFor(int component,
int motif,
Sequence sequence,
int startpos)
throws Exception
MotifDiscoverer
component - the component indexmotif - the index of the motif in the componentsequence - the given sequencestartpos - the start position in the sequence
Exception - if the strand could not be computed for any reasons
public int getGlobalIndexOfMotifInComponent(int component,
int motif)
MotifDiscoverermotif used in
component. The index returned must be at least 0 and less
than MotifDiscoverer.getNumberOfMotifs().
component - the component indexmotif - the motif index in the component
motif in component
public void trainBgModel(DataSet data,
double[] weights)
throws Exception
HiddenMotifMixture
trainBgModel in class HiddenMotifMixturedata - the data setweights - the weights
Exception - if something went wrong
protected double iterate(int start,
double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
AbstractMixtureTrainSM
iterate in class AbstractMixtureTrainSMstart - the index of the trainingdataWeights - the weights for each sequence or nullm - the random generator for initiating the algorithmparams - the parameters for the sequences
Exception - if something went wrongAbstractMixtureTrainSM.doFirstIteration(DataSet, double[],
MultivariateRandomGenerator, MRGParams[]),
AbstractMixtureTrainSM.continueIterations(double[], double[][]),
AbstractMixtureTrainSM.continueIterations(double[], double[][], int,
int)public void setShiftCorrection(boolean correct)
correct - switch that determines whether to correct shifts or not
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||