| Constructor and Description |
|---|
KMereStatistic(DataSet data,
int k)
This constructor creates an internal statistic counting all
k-mers in the data. |
| Modifier and Type | Method and Description |
|---|---|
static DataSet.WeightedDataSetFactory |
getAbsoluteKMereFrequencies(DataSet data,
int k,
boolean bothStrands)
This method enables the user to get a statistic over all
k-mers
in the data. |
static DataSet.WeightedDataSetFactory |
getAbsoluteKMereFrequencies(DataSet data,
int k,
boolean bothStrands,
DataSet.WeightedDataSetFactory.SortOperation sortOp)
This method enables the user to get a statistic over all
k-mers
in the data. |
static Sequence[] |
getCommonString(DataSet data,
int motifLength,
boolean bothStrands)
This method returns an array of sequences of length
motifLength so that each string is contained in all
sequences of the data set, more precisely in the data set or the reverse
complementary data set. |
static LinkedList<Sequence> |
getConservedPatterns(Hashtable<Sequence,BitSet[]> statistic,
int dataSetIndex,
int threshold)
This method returns a list of
Sequences. |
static Pair<Sequence,BitSet[]>[] |
getKmereSequenceStatistic(boolean bothStrands,
int maxMismatch,
HashSet<Sequence> filter,
DataSet... data)
This method enables the user to get a statistic for a set of
k-mers. |
static Hashtable<Sequence,BitSet[]> |
getKmereSequenceStatistic(int k,
boolean bothStrands,
int addIndex,
DataSet... data)
This method enables the user to get a statistic over all
k-mers
in the sequences. |
double[][] |
getSmoothedProfile(int window,
Sequence... seq)
This method returns an array of smoothed profiles.
|
double[][] |
getSmoothedProfile(int window,
String... kmere)
This method returns an array of smoothed profiles.
|
static Hashtable<Sequence,BitSet[]> |
merge(Hashtable<Sequence,BitSet[]> statistic,
int maximalMissmatch,
boolean bothStrands)
This method allows to merge the statistics of k-mers by allowing mismatches.
|
static Hashtable<Sequence,BitSet[]> |
removeBackground(Hashtable<Sequence,BitSet[]> statistic,
int fgIndex,
int bgIndex,
double fgWeight,
double bgWeight)
This method allows to remove those entries from the statistic that have a lower weighted foreground cardinality than the weighted background cardinality.
|
public KMereStatistic(DataSet data, int k)
k-mers in the data.data - the datak - the number of symbols in each counted wordpublic double[][] getSmoothedProfile(int window,
String... kmere)
window - the window length, for no smoothing use 1kmere - the k-meregetSmoothedProfile(int, Sequence...),
Sequence.create(AlphabetContainer, String)public double[][] getSmoothedProfile(int window,
Sequence... seq)
window - the window length, for no smoothing use 1seq - the Sequence instances containing the k-merespublic static Sequence[] getCommonString(DataSet data, int motifLength, boolean bothStrands) throws Exception
motifLength so that each string is contained in all
sequences of the data set, more precisely in the data set or the reverse
complementary data set.data - the data set of sequencesmotifLength - the motif lengthbothStrands - the switch for using both strand true or only
forward strand falsemotifLength so that
each sequence is contained in data on
either strandException - if something went wrongpublic static DataSet.WeightedDataSetFactory getAbsoluteKMereFrequencies(DataSet data, int k, boolean bothStrands) throws Exception
k-mers
in the data. That is it counts the outcome of each
k-mere in the complete data.data - the data set of sequencesk - the motif lengthbothStrands - the switch for using both strand true or only
forward strand false. If true
for each k-mer only this k-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory.DataSet.WeightedDataSetFactory containing all k-mers and
their absolute frequencies in data respectively on
one strand of the dataException - if something went wronggetAbsoluteKMereFrequencies(DataSet, int, boolean, DataSet.WeightedDataSetFactory.SortOperation),
DataSet.WeightedDataSetFactory.SortOperation.NO_SORTpublic static DataSet.WeightedDataSetFactory getAbsoluteKMereFrequencies(DataSet data, int k, boolean bothStrands, DataSet.WeightedDataSetFactory.SortOperation sortOp) throws Exception
k-mers
in the data. That is it counts the outcome of each
k-mere in the complete data.data - the data set of sequencesk - the motif lengthbothStrands - the switch for using both strand true or only
forward strand false. If true
for each k-mer only this k-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory.sortOp - the way how the result should be sortedDataSet.WeightedDataSetFactory containing all k-mers and
their absolute frequencies in data respectively on
one strand of the dataException - if something went wrongpublic static Hashtable<Sequence,BitSet[]> getKmereSequenceStatistic(int k, boolean bothStrands, int addIndex, DataSet... data) throws WrongAlphabetException, OperationNotSupportedException
k-mers
in the sequences. That is, it creates for each occurring k-mer an array
of BitSets indicating for each data set and each sequence whether it contains
the k-mer (or its reverse complement) or not.data - the DataSets of Sequencesk - the motif lengthbothStrands - the switch for using both strand true or only
forward strand false. If true
for each k-mer only this k-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory.addIndex - the maximal index for inserting new k-meresHashtable on Sequences and arrays of BitSets; each
entry encodes a k-mer and the occurrence of this k-mer
in each data set and sequence; if a k-mer occurs in data set
d in sequence n the n-th bit of the
d-th BitSet is true.WrongAlphabetException - if the AlphabetContainers of the DataSets do not match or if they are not simple and discreteOperationNotSupportedException - if the bothStrands==true but the reverse complement could not be computedHashtable,
merge(Hashtable, int, boolean)public static Pair<Sequence,BitSet[]>[] getKmereSequenceStatistic(boolean bothStrands, int maxMismatch, HashSet<Sequence> filter, DataSet... data) throws WrongAlphabetException, OperationNotSupportedException
k-mers.
That is, it creates for each k-mer from filter an array
of BitSets indicating for each data set and each sequence whether it contains
the k-mer (or its reverse complement) or not.bothStrands - the switch for using both strand true or only
forward strand false. If true
for each k-mer only this k-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory.maxMismatch - the maximal number of mismatchesfilter - a filter containing all interesting k-mersdata - the DataSets of SequencesHashtable on Sequences and arrays of BitSets; each
entry encodes a k-mer and the occurrence of this k-mer
in each data set and sequence; if a k-mer occurs in data set
d in sequence n the n-th bit of the
d-th BitSet is true.WrongAlphabetException - if the AlphabetContainers of the DataSets do not match or if they are not simple and discreteOperationNotSupportedException - if the bothStrands==true but the reverse complement could not be computedHashtable,
merge(Hashtable, int, boolean)public static Hashtable<Sequence,BitSet[]> merge(Hashtable<Sequence,BitSet[]> statistic, int maximalMissmatch, boolean bothStrands) throws OperationNotSupportedException, CloneNotSupportedException, WrongLengthException, WrongAlphabetException
statistic - a statistic as obtained from getKmereSequenceStatistic(int, boolean, int, DataSet...)maximalMissmatch - the maximal number of allowed mismatchesbothStrands - the switch for using both strand true or only forward strand false.OperationNotSupportedException - if the bothStrands==true but the reverse complement could not be computedCloneNotSupportedException - if an array of BitSet can not be clonedWrongAlphabetException - see Sequence.getHammingDistance(Sequence)WrongLengthException - see Sequence.getHammingDistance(Sequence)Sequence.getHammingDistance(Sequence),
getKmereSequenceStatistic(int, boolean, int, DataSet...)public static LinkedList<Sequence> getConservedPatterns(Hashtable<Sequence,BitSet[]> statistic, int dataSetIndex, int threshold)
Sequences. Each entry corresponds to a sequence
or a set of sequences (depending on the input of the statistic) that occurs
in more than threshold Sequences of the data set.statistic - a statistic as obtained from getKmereSequenceStatistic(int, boolean, int, DataSet...) or merge(Hashtable, int, boolean)dataSetIndex - the index of the BitSet to be usedthreshold - a threshold that has to be exceeded by BitSet.cardinality() to be declared as a conserved patterngetKmereSequenceStatistic(int, boolean, int, DataSet...),
merge(Hashtable, int, boolean)public static Hashtable<Sequence,BitSet[]> removeBackground(Hashtable<Sequence,BitSet[]> statistic, int fgIndex, int bgIndex, double fgWeight, double bgWeight)
statistic - a statistic as obtained from getKmereSequenceStatistic(int, boolean, int, DataSet...) or merge(Hashtable, int, boolean)fgIndex - the foreground index of the BitSet to be usedbgIndex - the background index of the BitSet to be usedfgWeight - the weight used to weight the foreground cardinalitybgWeight - the weight used to weight the background cardinalityHashtable containing only the positive entries