de.jstacs.motifDiscovery
Class SignificantMotifOccurrencesFinder

java.lang.Object
  extended by de.jstacs.motifDiscovery.SignificantMotifOccurrencesFinder

public class SignificantMotifOccurrencesFinder
extends Object

This class enables the user to predict motif occurrences given a specific significance level.

Author:
Jan Grau, Jens Keilwagen

Nested Class Summary
static class SignificantMotifOccurrencesFinder.RandomSeqType
           
 
Constructor Summary
SignificantMotifOccurrencesFinder(MotifDiscoverer disc, Sample bg, double sign)
          This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a Sample to determine the siginificance level.
SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.RandomSeqType type, boolean oneHistogram, int numSequences, double sign)
          This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.
 
Method Summary
 Sample annotateMotifs(int startPos, Sample data)
          This method annotates a Sample starting in each sequence at startPos.
 Sample annotateMotifs(int startPos, Sample data, int addMax)
          This method annotates a Sample starting in each sequence at startPos.
 Sample annotateMotifs(Sample data)
          This method annotates a Sample.
 Sample annotateMotifs(Sample data, int addMax)
          This method annotates a Sample.
 MotifAnnotation[] findSignificantMotifOccurrences(int motif, Sequence seq, int start)
          This method finds the significant motif occurrences in the sequence.
 Sample getBindingSites(int startPos, Sample data, int addMax, int addLeft, int addRight)
          This method returns a Sample containing the predicted binding sites.
 Sample getBindingSites(Sample data)
          This method returns a Sample containing the predicted binding sites.
 double getFactorForAucPR()
          This method returns a factor that must be multiplied to scores for computing PR curves.
 int getNumberOfBoundSequences(Sample data, int motifIndex)
          Returns the number of sequences in data that are predicted to be bound at least once by motif no.
 double getOffsetForAucPR()
          This method returns an offset that must be added to scores for computing PR curves.
 double[][] getValuesForEachNucleotide(Sample data, int component, int motif, boolean addOnlyBest)
          This method determines a value for each symbol to be annotated at least in one motif occurrence of the motif with index index in the component component.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SignificantMotifOccurrencesFinder

public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         SignificantMotifOccurrencesFinder.RandomSeqType type,
                                         boolean oneHistogram,
                                         int numSequences,
                                         double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.

Parameters:
disc - the MotifDiscoverer for the prediction
type - the type that determines how the significance level is determined
oneHistogram - a switch to decide whether to use one background distribution histogram for all sequence or sequence specific background distribution histograms
numSequences - the number of sampled sequence instances used to determine the significance level
sign - the significance level

SignificantMotifOccurrencesFinder

public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         Sample bg,
                                         double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a Sample to determine the siginificance level.

Parameters:
disc - the MotifDiscoverer for the prediction
bg - the background data set
sign - the significance level
Method Detail

findSignificantMotifOccurrences

public MotifAnnotation[] findSignificantMotifOccurrences(int motif,
                                                         Sequence seq,
                                                         int start)
                                                  throws Exception
This method finds the significant motif occurrences in the sequence.

Parameters:
motif - the motif index
seq - the sequence
start - the start position
Returns:
an array of MotifAnnotation for the sequence
Throws:
Exception - if the background sample could not be created, or some of the scores could not be computed

annotateMotifs

public Sample annotateMotifs(Sample data)
                      throws Exception
This method annotates a Sample.

Parameters:
data - the Sample
Returns:
an annotated Sample
Throws:
Exception - if something went wrong
See Also:
annotateMotifs(int, Sample, int)

annotateMotifs

public Sample annotateMotifs(int startPos,
                             Sample data)
                      throws Exception
This method annotates a Sample starting in each sequence at startPos.

Parameters:
startPos - the start position used for all sequences
data - the Sample
Returns:
an annotated Sample
Throws:
Exception - if something went wrong
See Also:
annotateMotifs(int, Sample, int)

annotateMotifs

public Sample annotateMotifs(Sample data,
                             int addMax)
                      throws Exception
This method annotates a Sample. At most, addMax motif occurrences of each motif instance will be annotated.

Parameters:
data - the Sample
addMax - the number of motif occurrences that can at most be annotated for each motif instance
Returns:
an annotated Sample
Throws:
Exception - if something went wrong
See Also:
annotateMotifs(int, Sample, int)

annotateMotifs

public Sample annotateMotifs(int startPos,
                             Sample data,
                             int addMax)
                      throws Exception
This method annotates a Sample starting in each sequence at startPos. At most, addMax motif occurrences of each motif instance will be annotated.

Parameters:
startPos - the start position used for all sequences
data - the Sample
addMax - the number of motif occurrences that can at most be annotated for each motif instance
Returns:
an annotated Sample
Throws:
Exception - if something went wrong
See Also:
annotateMotifs(int, Sample, int)

getBindingSites

public Sample getBindingSites(Sample data)
                       throws Exception
This method returns a Sample containing the predicted binding sites.

Parameters:
data - the Sample
Returns:
a Sample containing the predicted binding sites
Throws:
Exception - if something went wrong

getBindingSites

public Sample getBindingSites(int startPos,
                              Sample data,
                              int addMax,
                              int addLeft,
                              int addRight)
                       throws Exception
This method returns a Sample containing the predicted binding sites.

Parameters:
startPos - the start position used for all sequences
data - the Sample
addMax - the number of motif occurrences that can at most be annotated for each motif instance
addLeft - number of positions added to the left of the predicted motif occurrence
addRight - number of positions added to the right of the predicted motif occurrence
Returns:
a Sample containing the predicted binding sites
Throws:
Exception - if something went wrong

getNumberOfBoundSequences

public int getNumberOfBoundSequences(Sample data,
                                     int motifIndex)
                              throws Exception
Returns the number of sequences in data that are predicted to be bound at least once by motif no. motif.

Parameters:
data - the data
motifIndex - the index of the motif
Returns:
the number of sequences in data bound by motif motif
Throws:
Exception - if the background sample for the prediction could not be created or some of the scores could not be computed

getOffsetForAucPR

public double getOffsetForAucPR()
This method returns an offset that must be added to scores for computing PR curves. If this SignificantMotifOccurrencesFinder was instantiated using oneHistogram=true, the getValuesForEachNucleotide(Sample, int, int, boolean) returns scores and no offset is needed. Otherwise, it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the offset is 1.

Returns:
the offset
See Also:
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(Sample, double[][], double, double, String)

getFactorForAucPR

public double getFactorForAucPR()
This method returns a factor that must be multiplied to scores for computing PR curves. If this SignificantMotifOccurrencesFinder was instantiated using oneHistogram=true, the getValuesForEachNucleotide(Sample, int, int, boolean) returns scores and a factor of 1 is appropriate. Otherwise, it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the factor is -1.

Returns:
the factor
See Also:
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(Sample, double[][], double, double, String)

getValuesForEachNucleotide

public double[][] getValuesForEachNucleotide(Sample data,
                                             int component,
                                             int motif,
                                             boolean addOnlyBest)
                                      throws Exception
This method determines a value for each symbol to be annotated at least in one motif occurrence of the motif with index index in the component component.

Parameters:
data - the Sample
component - the component index
motif - the motif index
addOnlyBest - a switch whether to add only the best
Returns:
an array containing for each sequence an array with the p-value for each symbol in the sequence
Throws:
Exception - if something went wrong
See Also:
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(Sample, double[][], double, double, String), getOffsetForAucPR(), getFactorForAucPR()