|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.jstacs.models.AbstractModel
de.jstacs.models.mixture.AbstractMixtureModel
de.jstacs.models.mixture.motif.HiddenMotifMixture
de.jstacs.models.mixture.motif.SingleHiddenMotifMixture
public class SingleHiddenMotifMixture
This class enables the user to search for a single motif in a sequence. The user is enabled to train the model either
"one occurrence per sequence" (=OOPS) or "zero or one occurrence per sequence" (=ZOOPS).
If EM is used for training the parameters are trained in a MEME-like manner.
Currently only EM is implemented.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class de.jstacs.models.mixture.AbstractMixtureModel |
|---|
AbstractMixtureModel.Algorithm, AbstractMixtureModel.Parameterization |
| Nested classes/interfaces inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer |
|---|
MotifDiscoverer.KindOfProfile |
| Field Summary |
|---|
| Fields inherited from class de.jstacs.models.mixture.motif.HiddenMotifMixture |
|---|
bgMaxMarkovOrder, posPrior, trainOnlyMotifModel |
| Fields inherited from class de.jstacs.models.mixture.AbstractMixtureModel |
|---|
algorithm, algorithmHasBeenRun, alternativeModel, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, sostream, starts, stationaryIteration, weights |
| Fields inherited from class de.jstacs.models.AbstractModel |
|---|
alphabets, length |
| Constructor Summary | |
|---|---|
protected |
SingleHiddenMotifMixture(Model motif,
Model bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
double[] weights,
PositionPrior posPrior,
AbstractMixtureModel.Algorithm algorithm,
double alpha,
double eps,
AbstractMixtureModel.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new SingleHiddenMotifMixture. |
|
SingleHiddenMotifMixture(Model motif,
Model bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
PositionPrior posPrior,
double alpha,
double eps,
AbstractMixtureModel.Parameterization parametrization)
Creates a single hidden motif mixture using EM and estimating the probability for finding a motif. |
|
SingleHiddenMotifMixture(Model motif,
Model bg,
boolean trainOnlyMotifModel,
int starts,
double motifProb,
PositionPrior posPrior,
double alpha,
double eps,
AbstractMixtureModel.Parameterization parametrization)
Creates a single hidden motif mixture using EM and fixed probability for finding a motif. |
|
SingleHiddenMotifMixture(StringBuffer xml)
The standard constructor for the interface Storable. |
| Method Summary | |
|---|---|
protected double[][] |
createSeqWeightsArray()
Creates an array that can be used for weighting sequences in the algorithm. |
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current model on the internal sample. |
int |
getGlobalIndexOfMotifInComponent(int component,
int motif)
Returns the global index of the motif used in component. |
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
Returns the log probability for the sequence and the given component using the current parameter set. |
int |
getMinimalSequenceLength()
Returns the minimal length a sequence respectively a sample has to have. |
int |
getMotifLength(int motif)
This method returns the length of the motif with index motif. |
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score. |
int |
getNumberOfMotifs()
Returns the number of motifs for this MotifDiscoverer |
int |
getNumberOfMotifsInComponent(int component)
Returns the number of motifs that are used in the component component of this motif discoverer. |
double[] |
getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
Returns the profile of the scores for component component and motif motif at all possible
start-positions of the motif in the sequence sequence. |
StrandedLocatedSequenceAnnotationWithLength.Strand |
getStrandFor(int component,
int motif,
Sequence sequence,
int startpos)
This method returns the strand for a given subsequence if it is consider as site of the motif model in a specific component. |
protected double |
modify(double[] containsMotif,
double[] startpos,
int start,
int end)
This method modifies the computed weights for one sequence and returns the score. |
protected void |
setTrainData(Sample data)
This method is invoked by the train method and set for a given sample the sample that should be used for train. |
| Methods inherited from class de.jstacs.models.mixture.motif.HiddenMotifMixture |
|---|
checkLength, clone, emitSampleUsingCurrentParameterSet, extractFurtherInformation, getFurtherInformation, getInstanceName, getLogPriorTerm, getNewParameters, toString, train, trainBgModel |
| Methods inherited from class de.jstacs.models.AbstractModel |
|---|
getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogProbFor, getMaximalMarkovOrder, getPriorTerm, getProbFor, getProbFor, setNewAlphabetContainerInstance, train |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer |
|---|
getIndexOfMaximalComponentFor, getNumberOfComponents |
| Methods inherited from interface de.jstacs.Storable |
|---|
toXML |
| Constructor Detail |
|---|
protected SingleHiddenMotifMixture(Model motif,
Model bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
double[] weights,
PositionPrior posPrior,
AbstractMixtureModel.Algorithm algorithm,
double alpha,
double eps,
AbstractMixtureModel.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
motif - the motif model, if the model is trained using
AbstractMixtureModel.Algorithm.GIBBS_SAMPLING the model has to implement
GibbsSamplingComponent.bg - the background model for the flanking sequences and for those sequences that do not contain a binding site,
if trainOnlyMotifModel=false and algorithm=AbstractMixtureModel.Algorithm.GIBBS_SAMPLING
the model has to implement GibbsSamplingComponent
The model has to be able to score sequences of arbitrary length.starts - the number of times the algorithm will be started in the train method, at least 1componentHyperParams - the hyperparameters for the component assignment prior,
estimateComponentProbs == true
null or has to have length dimension
null or an array with all values zero (0) than ML
parameterization
weights - null or the weights for the components (than weights.length == dimension)posPrior - this object determine the positional distribution that shall be usedtrainOnlyMotifModel - a switch whether to train only the motif modelalgorithm - either AbstractMixtureModel.Algorithm.EM
or AbstractMixtureModel.Algorithm.GIBBS_SAMPLINGalpha - only for AbstractMixtureModel.Algorithm.EMtrain to initialize
the gammas. It is recommended to use alpha = 1 (uniform distribution on a simplex).eps - only for AbstractMixtureModel.Algorithm.EMparametrization - only for AbstractMixtureModel.Algorithm.EMAbstractMixtureModel.Parameterization.THETA
or AbstractMixtureModel.Parameterization.LAMBDA
LAMBDA
initialIteration - only for AbstractMixtureModel.Algorithm.GIBBS_SAMPLINGstationaryIteration/starts)stationaryIteration - only for AbstractMixtureModel.Algorithm.GIBBS_SAMPLINGburnInTest - only for AbstractMixtureModel.Algorithm.GIBBS_SAMPLINGIllegalArgumentException - if
weights != null && weights.length != 2
weights != null and it exists an i where weights[i] < 0
starts < 1
WrongAlphabetException - if not all models work on the same simple alphabet
CloneNotSupportedException - if the models can not be cloned
public SingleHiddenMotifMixture(Model motif,
Model bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
PositionPrior posPrior,
double alpha,
double eps,
AbstractMixtureModel.Parameterization parametrization)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
motif - the motif modelbg - the background model for the flanking sequences and for those sequences that do not contain a binding site.
The model has to be able to score sequences of arbitrary length.starts - the number of times the algorithm will be started in the train method, at least 1componentHyperParams - the hyperparameters for the component assignment prior,
estimateComponentProbs == true
null or has to have length dimension
null or an array with all values zero (0) than ML
parameterization
posPrior - this object determine the positional distribution that shall be usedtrainOnlyMotifModel - a switch whether to train only the motif modelalpha - the positive parameter for the Dirichlet which is used when you invoke train to initialize
the gammas. It is recommended to use alpha = 1 (uniform distribution on a simplex).eps - the non-negative threshold for stopping the EM-algorithmparametrization - the type of the component probability parameterization;
AbstractMixtureModel.Parameterization.THETA
or AbstractMixtureModel.Parameterization.LAMBDA
LAMBDA
IllegalArgumentException - if
starts < 1
WrongAlphabetException - if not all models work on the same simple alphabet
CloneNotSupportedException - if the models can not be cloned
WrongAlphabetException - if not all models work on the same alphabet
CloneNotSupportedException - if the models can not be clonedSingleHiddenMotifMixture(de.jstacs.models.Model, de.jstacs.models.Model, boolean, int, double[], double[], de.jstacs.models.mixture.motif.positionprior.PositionPrior, de.jstacs.models.mixture.AbstractMixtureModel.Algorithm, double, double, de.jstacs.models.mixture.AbstractMixtureModel.Parameterization, int, int, de.jstacs.models.mixture.gibbssampling.BurnInTest),
AbstractMixtureModel.Algorithm.EM
public SingleHiddenMotifMixture(Model motif,
Model bg,
boolean trainOnlyMotifModel,
int starts,
double motifProb,
PositionPrior posPrior,
double alpha,
double eps,
AbstractMixtureModel.Parameterization parametrization)
throws CloneNotSupportedException,
IllegalArgumentException,
WrongAlphabetException
motif - the motif modelbg - the background model for the flanking sequences and for those sequences that do not contain a binding site.
The model has to be able to score sequences of arbitrary length.starts - the number of times the algorithm will be started in the train method, at least 1motifProb - the probability of finding a motif in a sequence (in [0,1])posPrior - this object determine the positional distribution that shall be usedtrainOnlyMotifModel - a switch whether to train only the motif modelalpha - the positive parameter for the Dirichlet which is used when you invoke train to initialize
the gammas. It is recommended to use alpha = 1 (uniform distribution on a simplex).eps - the non-negative threshold for stopping the EM-algorithmparametrization - the type of the component probability parameterization;
AbstractMixtureModel.Parameterization.THETA
or AbstractMixtureModel.Parameterization.LAMBDA
LAMBDA
IllegalArgumentException - if
motifProb < 0 or motifProb > 1
starts < 1
WrongAlphabetException - if not all models work on the same simple alphabet
CloneNotSupportedException - if the models can not be clonedSingleHiddenMotifMixture(de.jstacs.models.Model, de.jstacs.models.Model, boolean, int, double[], double[], de.jstacs.models.mixture.motif.positionprior.PositionPrior, de.jstacs.models.mixture.AbstractMixtureModel.Algorithm, double, double, de.jstacs.models.mixture.AbstractMixtureModel.Parameterization, int, int, de.jstacs.models.mixture.gibbssampling.BurnInTest),
AbstractMixtureModel.Algorithm.EM
public SingleHiddenMotifMixture(StringBuffer xml)
throws NonParsableException
Storable.
xml - the StringBuffer containing the model
NonParsableException - if the StringBuffer can not be parsed| Method Detail |
|---|
protected void setTrainData(Sample data)
throws Exception
AbstractMixtureModel
setTrainData in class AbstractMixtureModeldata - the given sample
Exceptionprotected double[][] createSeqWeightsArray()
AbstractMixtureModel
createSeqWeightsArray in class AbstractMixtureModel
protected double[][] doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
AbstractMixtureModel
doFirstIteration in class AbstractMixtureModeldataWeights - null or the weights of each element of the samplem - the multivariate random generatorparams - the parameters for the multivariate random generator
Exception - if something went wrong
protected double getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
throws Exception
AbstractMixtureModel
getNewWeights in class AbstractMixtureModeldataWeights - the weights for internal sample (should not be changed)w - the array for the statistic of the component parameters (shall
be filled)seqweights - an array containing for each component the weights for each
sequence (shall be filled)
Exception - if something went wrong
protected double modify(double[] containsMotif,
double[] startpos,
int start,
int end)
containsMotif - an array to return the weights for containing a motif (index 0) or containing no motif (index 1)startpos - the array containing the scores for each start position (including no motif in the sequence)start - the start indexend - the end index
protected double getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
throws Exception
AbstractMixtureModel
getLogProbUsingCurrentParameterSetFor in class AbstractMixtureModelcomponent - the index of the componentseq - the sequencestart - the start position in the sequenceend - the end position in the sequence
log P(s,component) = log P(s|component) + log P(component)
Exception - if not trained yet or something else went wrongAbstractMixtureModel.getNumberOfComponents()
public double[] getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
throws Exception
MotifDiscoverercomponent and motif motif at all possible
start-positions of the motif in the sequence sequence. This array should be of length sequence.length() - startpos - motifs[motif].length() + 1.
component - the componentmotif - the index of the motif in the componentsequence - the sequencestartpos - the start positionkind - indicates the kind of profile
Exception - if the score could not be computed for any reasonspublic int getMinimalSequenceLength()
HiddenMotifMixture
getMinimalSequenceLength in class HiddenMotifMixturepublic int getMotifLength(int motif)
MotifDiscoverermotif.
motif - the index of the motif
motifpublic int getNumberOfMotifs()
MotifDiscovererMotifDiscoverer
public int getNumberOfMotifsInComponent(int component)
MotifDiscoverercomponent of this motif discoverer.
component - the component
public StrandedLocatedSequenceAnnotationWithLength.Strand getStrandFor(int component,
int motif,
Sequence sequence,
int startpos)
throws Exception
MotifDiscoverer
component - the componentmotif - the index of the motif in the componentsequence - the sequencestartpos - the start position in the sequence
Exception - if the strand could not be computed for any reasons
public int getGlobalIndexOfMotifInComponent(int component,
int motif)
MotifDiscoverermotif used in component.
The index returned must be at least 0 and less than getNumberOfMotifs().
component - the component indexmotif - the motif index in the component
motif in component
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||