public class PFMComparator extends Object
Modifier and Type | Class and Description |
---|---|
static class |
PFMComparator.NormalizedEuclideanDistance
This class implements the normalized Euclidean distance.
|
static class |
PFMComparator.OneMinusPearsonCorrelationCoefficient
This class implements the Pearson correlation coefficient.
|
static class |
PFMComparator.PFMDistance
This interface declares a method for comparing different PFMs.
|
static class |
PFMComparator.SymmetricKullbackLeiblerDivergence
This class implements the symmetric Kullback-Leibler-divergence.
|
static class |
PFMComparator.UniformBorderWrapper
Wraps a given
PFMComparator.PFMDistance and pads the considered PFMs with uniformly distributed positions. |
Constructor and Description |
---|
PFMComparator() |
Modifier and Type | Method and Description |
---|---|
static ComparableElement<String,Double>[] |
find(ComplementableDiscreteAlphabet abc,
double[][] pfm,
ArrayList<AbstractMap.SimpleEntry<String,double[][]>> knownPFMs,
PFMComparator.PFMDistance distance,
int minOverlap,
int allowMiniShift,
boolean pValues,
double threshold)
This methods finds for a user specified PFM
pfm similar PFMs in a list of known PFMs. |
static String |
getConsensus(AlphabetContainer con,
double[][] pfm)
This method extracts the
The method does not use any degenerated IUPAC code.
|
static double[] |
getCounts(DataSet... data)
This method counts the occurrences of symbols in the given data sets.
|
static double[][] |
getPFM(DataSet data)
|
static double[][] |
getPFM(DataSet data,
double[] weights)
|
static double[][] |
getPFM(DataSet data,
int start,
int end)
Returns a position frequency matrix (PFM, rows=positions, columns=symbols) for the given subset of
DataSet . |
static double[][] |
getPWM(DataSet data,
int start,
int end)
Returns a position weight matrix (PWM, rows=positions, columns=symbols, containing probabilities) for the given subset of
DataSet . |
static double[][] |
getReverseComplement(ComplementableDiscreteAlphabet abc,
double[][] pfm)
This method returns the PFM that is the reverse complement of the given PFM.
|
static String |
matrixToString(double[][] matrix)
Returns a string representation of the matrix, where each row of the matrix
is printed on one line and columns are separated by tabstops.
|
static void |
normalize(double[] counts)
This method enables the user to normalize a array containing counts.
|
static AbstractMap.SimpleEntry<String,double[][]> |
readPFMFromUniprobe(String annotPrefix,
File file)
Reads a PFM from the UniProbe format.
|
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromEMBL(String fileName,
int max)
This method reads a number of PFMs from a file and return them in an
ArrayList together with some annotation. |
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromJasparFastA(BufferedReader r)
Reads a list of PFMs from the Jaspar format
|
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromJasparFastA(String filename)
This method reads a number of PFMs from a file in the Jaspar FastA-like format and
returns them in an
ArrayList together with some annotation. |
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromTransfac(String fileName,
int max)
Reads a set of PFM from the Transfac format
|
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromUniprobe(String startdir)
This method reads a number of PFMs from a directory as downloaded from the Uniprobe data base and
returns them in an
ArrayList together with some annotation. |
public static String matrixToString(double[][] matrix)
matrix
- the matrixpublic static String getConsensus(AlphabetContainer con, double[][] pfm)
con
- the AlphabetContainer
that was used to create the PFMpfm
- the position frequency matrix (or also possible the position weight matrix)public static double[] getCounts(DataSet... data)
data
- the data setspublic static void normalize(double[] counts)
counts
- the array of countspublic static double[][] getPFM(DataSet data)
data
- the DataSet
public static double[][] getPFM(DataSet data, int start, int end)
DataSet
.data
- the data setstart
- the first sequence to considerend
- the first sequence not not considerpublic static double[][] getPWM(DataSet data, int start, int end)
DataSet
.data
- the data setstart
- the first sequence to considerend
- the first sequence not not considerpublic static double[][] getPFM(DataSet data, double[] weights)
data
- the DataSet
weights
- the weights on the sequences in data
public static double[][] getReverseComplement(ComplementableDiscreteAlphabet abc, double[][] pfm)
abc
- the ComplementableDiscreteAlphabet
that is used to create the reverse complementpfm
- the absolute frequencies for each position and symbol; public static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromUniprobe(String startdir) throws IOException
ArrayList
together with some annotation.startdir
- the directory containing the PFMs in Uniprobe formatArrayList
containing AbstractMap.SimpleEntry
s with annotation and PFMIOException
- if something went wrong while reading a filepublic static AbstractMap.SimpleEntry<String,double[][]> readPFMFromUniprobe(String annotPrefix, File file) throws IOException
annotPrefix
- a prefix on the PFM's annotationfile
- the File
to readIOException
- if the file could not be readpublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromJasparFastA(String filename) throws IOException
ArrayList
together with some annotation.filename
- the path to the file containing the Jaspar PFMsArrayList
containing AbstractMap.SimpleEntry
s with annotation and PFMIOException
- if something went wrong while reading the filepublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromJasparFastA(BufferedReader r) throws IOException
r
- the BufferedReader
on the Jaspar fileIOException
- if the contents of r
could not be readpublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromEMBL(String fileName, int max) throws IOException
ArrayList
together with some annotation.fileName
- the name of the file containing the PFMs in EMBL formatmax
- the maximal number of PFMs that should be readArrayList
containing AbstractMap.SimpleEntry
s with annotation and PFMIOException
- if something went wrong while reading the filepublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromTransfac(String fileName, int max) throws IOException
fileName
- the Transfac filemax
- the maximum number of PFMs to readIOException
- if the file could not be readpublic static ComparableElement<String,Double>[] find(ComplementableDiscreteAlphabet abc, double[][] pfm, ArrayList<AbstractMap.SimpleEntry<String,double[][]>> knownPFMs, PFMComparator.PFMDistance distance, int minOverlap, int allowMiniShift, boolean pValues, double threshold)
pfm
similar PFMs in a list of known PFMs.abc
- the alphabet of this PFMpfm
- the PFM to be compare to the known PFMsknownPFMs
- a ArrayList
of known PFMsdistance
- a distance measure to assess the similarity of the PFMsminOverlap
- the minimal number of consecutive positions in an alignmentallowMiniShift
- the minimal number of allowed shifts (even if the minimal overlap is bigger)pValues
- compute p-values for matrix similaritythreshold
- only hits with a distance below this threshold will be reportedreadPFMsFromEMBL(String, int)