Class SignificantMotifOccurrencesFinder

  extended by de.jstacs.motifDiscovery.SignificantMotifOccurrencesFinder

public class SignificantMotifOccurrencesFinder
extends Object

This class enables the user to predict motif occurrences given a specific significance level.

Jan Grau, Jens Keilwagen

Nested Class Summary
static interface SignificantMotifOccurrencesFinder.JoinMethod
          Interface for methods that combine several profiles over the same sequence into one common profile
static class SignificantMotifOccurrencesFinder.RandomSeqType
static class SignificantMotifOccurrencesFinder.SumOfProbabilities
          Joins several profiles containing log-probabilities into one profile containing the logarithm of the sum of the probabilities of the single profiles.
Constructor Summary
SignificantMotifOccurrencesFinder(MotifDiscoverer disc, DataSet bg, double[] weights, double sign)
          This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level.
SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.JoinMethod joiner, DataSet bg, double[] weights, double sign)
          This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level.
SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.RandomSeqType type, boolean oneHistogram, int numSequences, double sign)
          This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.
SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.RandomSeqType type, SignificantMotifOccurrencesFinder.JoinMethod joiner, boolean oneHistogram, int numSequences, double sign)
          This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.
Method Summary
 DataSet annotateMotif(DataSet data, int motifIndex)
          This method annotates a DataSet.
 DataSet annotateMotif(DataSet data, int motifIndex, int addMax)
          This method annotates a DataSet.
 DataSet annotateMotif(int startPos, DataSet data, int motifIndex)
          This method annotates a DataSet starting in each sequence at startPos.
 DataSet annotateMotif(int startPos, DataSet data, int motifIndex, int addMax, boolean addAnnotation)
          This method annotates a DataSet starting in each sequence at startPos.
 MotifAnnotation[] findSignificantMotifOccurrences(int motif, Sequence seq, int start)
          This method finds the significant motif occurrences in the sequence.
 MotifAnnotation[] findSignificantMotifOccurrences(int motif, Sequence seq, int addMax, int start)
          This method finds the significant motif occurrences in the sequence.
 DataSet getBindingSites(DataSet data, int motifIndex)
          This method returns a DataSet containing the predicted binding sites.
 DataSet getBindingSites(int startPos, DataSet data, int motifIndex, int addMax, int addLeft, int addRight)
          This method returns a DataSet containing the predicted binding sites.
 double getFactorForAucPR()
          This method returns a factor that must be multiplied to scores for computing PR curves.
 MotifDiscoverer getMotifDiscoverer()
          This method returns a clone of the internally used MotifDiscoverer.
 double getNumberOfBoundSequences(DataSet data, double[] weights, int motifIndex)
          Returns the number of sequences in data that are predicted to be bound at least once by motif no.
 double getOffsetForAucPR()
          This method returns an offset that must be added to scores for computing PR curves.
 IntList getStartPositions(int startPos, DataSet data, int motifIndex, int addMax)
          This method returns a list of start positions of binding sites.
 double[][] getValuesForEachNucleotide(DataSet data, int motif, boolean addOnlyBest)
          This method determines a score for each possible starting position in each of the sequences in data that this position is covered by at least one motif occurrence of the motif with index index.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         SignificantMotifOccurrencesFinder.RandomSeqType type,
                                         boolean oneHistogram,
                                         int numSequences,
                                         double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.

disc - the MotifDiscoverer for the prediction
type - the type that determines how the significance level is determined
oneHistogram - a switch to decide whether to use one background distribution histogram for all sequence or sequence specific background distribution histograms
numSequences - the number of sampled sequence instances used to determine the significance level
sign - the significance level


public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         SignificantMotifOccurrencesFinder.RandomSeqType type,
                                         SignificantMotifOccurrencesFinder.JoinMethod joiner,
                                         boolean oneHistogram,
                                         int numSequences,
                                         double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.

disc - the MotifDiscoverer for the prediction
type - the type that determines how the significance level is determined
joiner - the SignificantMotifOccurrencesFinder.JoinMethod that defines how the profiles of the same motif in different components shall be joined
oneHistogram - a switch to decide whether to use one background distribution histogram for all sequence or sequence specific background distribution histograms
numSequences - the number of sampled sequence instances used to determine the significance level
sign - the significance level


public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         DataSet bg,
                                         double[] weights,
                                         double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level.

disc - the MotifDiscoverer for the prediction
bg - the background data set
weights - the weights of the background data set, can be null
sign - the significance level


public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         SignificantMotifOccurrencesFinder.JoinMethod joiner,
                                         DataSet bg,
                                         double[] weights,
                                         double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level.

disc - the MotifDiscoverer for the prediction
joiner - the SignificantMotifOccurrencesFinder.JoinMethod that defines how the profiles of the same motif in different components shall be joined
bg - the background data set
weights - the weights of the background data set, can be null
sign - the significance level
Method Detail


public MotifAnnotation[] findSignificantMotifOccurrences(int motif,
                                                         Sequence seq,
                                                         int start)
                                                  throws Exception
This method finds the significant motif occurrences in the sequence.

motif - the motif index
seq - the sequence
start - the start position
an array of MotifAnnotation for the sequence
Exception - if the background sample could not be created, or some of the scores could not be computed


public MotifAnnotation[] findSignificantMotifOccurrences(int motif,
                                                         Sequence seq,
                                                         int addMax,
                                                         int start)
                                                  throws Exception
This method finds the significant motif occurrences in the sequence.

motif - the motif index
seq - the sequence
addMax - the number of motif occurrences that can at most be annotated
start - the start position
an array of MotifAnnotation for the sequence
Exception - if the background sample could not be created, or some of the scores could not be computed


public DataSet annotateMotif(DataSet data,
                             int motifIndex)
                      throws Exception
This method annotates a DataSet.

data - the DataSet
motifIndex - the index of the motif
an annotated DataSet
Exception - if something went wrong
See Also:
annotateMotif(int, DataSet, int)


public DataSet annotateMotif(int startPos,
                             DataSet data,
                             int motifIndex)
                      throws Exception
This method annotates a DataSet starting in each sequence at startPos.

startPos - the start position used for all sequences
data - the DataSet
motifIndex - the index of the motif
an annotated DataSet
Exception - if something went wrong
See Also:
annotateMotif(int, DataSet, int)


public DataSet annotateMotif(DataSet data,
                             int motifIndex,
                             int addMax)
                      throws Exception
This method annotates a DataSet. At most, addMax motif occurrences of the motif instance will be annotated.

data - the DataSet
motifIndex - the index of the motif
addMax - the number of motif occurrences that can at most be annotated for each motif instance
an annotated DataSet
Exception - if something went wrong
See Also:
annotateMotif(int, DataSet, int)


public DataSet annotateMotif(int startPos,
                             DataSet data,
                             int motifIndex,
                             int addMax,
                             boolean addAnnotation)
                      throws Exception
This method annotates a DataSet starting in each sequence at startPos. At most, addMax motif occurrences of the motif instance will be annotated.

startPos - the start position used for all sequences
data - the DataSet
motifIndex - the index of the motif
addMax - the number of motif occurrences that can at most be annotated for each motif instance
addAnnotation - a switch whether to add or replace the current annotation
an annotated DataSet
Exception - if something went wrong
See Also:
annotateMotif(int, DataSet, int)


public DataSet getBindingSites(DataSet data,
                               int motifIndex)
                        throws Exception
This method returns a DataSet containing the predicted binding sites.

data - the DataSet
motifIndex - the index of the motif
a DataSet containing the predicted binding sites
Exception - if something went wrong


public DataSet getBindingSites(int startPos,
                               DataSet data,
                               int motifIndex,
                               int addMax,
                               int addLeft,
                               int addRight)
                        throws Exception
This method returns a DataSet containing the predicted binding sites.

startPos - the start position used for all sequences
data - the DataSet
motifIndex - the index of the motif
addMax - the number of motif occurrences that can at most be annotated for each motif instance
addLeft - number of positions added to the left of the predicted motif occurrence
addRight - number of positions added to the right of the predicted motif occurrence
a DataSet containing the predicted binding sites
Exception - if something went wrong


public IntList getStartPositions(int startPos,
                                 DataSet data,
                                 int motifIndex,
                                 int addMax)
                          throws Exception
This method returns a list of start positions of binding sites.

startPos - the start position used for all sequences
data - the DataSet
motifIndex - the index of the motif
addMax - the number of motif occurrences that can at most be annotated for each motif instance
a list of start positions
Exception - if something went wrong


public double getNumberOfBoundSequences(DataSet data,
                                        double[] weights,
                                        int motifIndex)
                                 throws Exception
Returns the number of sequences in data that are predicted to be bound at least once by motif no. motif.

data - the data
weights - the weights of the data
motifIndex - the index of the motif
the number of sequences in data bound by motif motif
Exception - if the background sample for the prediction could not be created or some of the scores could not be computed


public double getOffsetForAucPR()
This method returns an offset that must be added to scores for computing PR curves. If this SignificantMotifOccurrencesFinder was instantiated using oneHistogram=true, the getValuesForEachNucleotide(DataSet, int, boolean) returns scores and no offset is needed. Otherwise, it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the offset is 1.

the offset
See Also:
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String)


public double getFactorForAucPR()
This method returns a factor that must be multiplied to scores for computing PR curves. If this SignificantMotifOccurrencesFinder was instantiated using oneHistogram=true, the getValuesForEachNucleotide(DataSet, int, boolean) returns scores and a factor of 1 is appropriate. Otherwise, it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the factor is -1.

the factor
See Also:
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String)


public double[][] getValuesForEachNucleotide(DataSet data,
                                             int motif,
                                             boolean addOnlyBest)
                                      throws Exception
This method determines a score for each possible starting position in each of the sequences in data that this position is covered by at least one motif occurrence of the motif with index index. If the SignificantMotifOccurrencesFinder was constructed using oneHistogram=true the returned values are arbitrary scores, and p-values otherwise.

data - the DataSet
motif - the motif index
addOnlyBest - a switch whether to add only the best
an array containing for each sequence an array with the scores for each starting position in the sequence
Exception - if something went wrong during the computation of the scores of the MotifDiscoverer
See Also:
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String), getOffsetForAucPR(), getFactorForAucPR()


public MotifDiscoverer getMotifDiscoverer()
                                   throws CloneNotSupportedException
This method returns a clone of the internally used MotifDiscoverer.

clone of the internally used MotifDiscoverer
CloneNotSupportedException - if the MotifDiscoverer can not be cloned correctly