public class PFMComparator extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
PFMComparator.NormalizedEuclideanDistance
This class implements the normalized Euclidean distance.
|
static class |
PFMComparator.OneMinusPearsonCorrelationCoefficient
This class implements the Pearson correlation coefficient.
|
static class |
PFMComparator.PFMDistance
This interface declares a method for comparing different PFMs.
|
static class |
PFMComparator.SymmetricKullbackLeiblerDivergence
This class implements the symmetric Kullback-Leibler-divergence.
|
static class |
PFMComparator.UniformBorderWrapper
Wraps a given
PFMComparator.PFMDistance and pads the considered PFMs with uniformly distributed positions. |
| Constructor and Description |
|---|
PFMComparator() |
| Modifier and Type | Method and Description |
|---|---|
static ComparableElement<String,Double>[] |
find(ComplementableDiscreteAlphabet abc,
double[][] pfm,
ArrayList<AbstractMap.SimpleEntry<String,double[][]>> knownPFMs,
PFMComparator.PFMDistance distance,
int minOverlap,
int allowMiniShift,
boolean pValues,
double threshold)
This methods finds for a user specified PFM
pfm similar PFMs in a list of known PFMs. |
static String |
getConsensus(AlphabetContainer con,
double[][] pfm)
This method extracts the
The method does not use any degenerated IUPAC code.
|
static double[] |
getCounts(DataSet... data)
This method counts the occurrences of symbols in the given data sets.
|
static double[][] |
getPFM(DataSet data)
|
static double[][] |
getPFM(DataSet data,
double[] weights)
|
static double[][] |
getPFM(DataSet data,
int start,
int end)
Returns a position frequency matrix (PFM, rows=positions, columns=symbols) for the given subset of
DataSet. |
static double[][] |
getPWM(DataSet data,
int start,
int end)
Returns a position weight matrix (PWM, rows=positions, columns=symbols, containing probabilities) for the given subset of
DataSet. |
static double[][] |
getReverseComplement(ComplementableDiscreteAlphabet abc,
double[][] pfm)
This method returns the PFM that is the reverse complement of the given PFM.
|
static String |
matrixToString(double[][] matrix)
Returns a string representation of the matrix, where each row of the matrix
is printed on one line and columns are separated by tabstops.
|
static void |
normalize(double[] counts)
This method enables the user to normalize a array containing counts.
|
static AbstractMap.SimpleEntry<String,double[][]> |
readPFMFromUniprobe(String annotPrefix,
File file)
Reads a PFM from the UniProbe format.
|
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromEMBL(String fileName,
int max)
This method reads a number of PFMs from a file and return them in an
ArrayList together with some annotation. |
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromJasparFastA(BufferedReader r)
Reads a list of PFMs from the Jaspar format
|
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromJasparFastA(String filename)
This method reads a number of PFMs from a file in the Jaspar FastA-like format and
returns them in an
ArrayList together with some annotation. |
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromTransfac(String fileName,
int max)
Reads a set of PFM from the Transfac format
|
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> |
readPFMsFromUniprobe(String startdir)
This method reads a number of PFMs from a directory as downloaded from the Uniprobe data base and
returns them in an
ArrayList together with some annotation. |
public static String matrixToString(double[][] matrix)
matrix - the matrixpublic static String getConsensus(AlphabetContainer con, double[][] pfm)
con - the AlphabetContainer that was used to create the PFMpfm - the position frequency matrix (or also possible the position weight matrix)public static double[] getCounts(DataSet... data)
data - the data setspublic static void normalize(double[] counts)
counts - the array of countspublic static double[][] getPFM(DataSet data, int start, int end)
DataSet.data - the data setstart - the first sequence to considerend - the first sequence not not considerpublic static double[][] getPWM(DataSet data, int start, int end)
DataSet.data - the data setstart - the first sequence to considerend - the first sequence not not considerpublic static double[][] getPFM(DataSet data, double[] weights)
data - the DataSetweights - the weights on the sequences in datapublic static double[][] getReverseComplement(ComplementableDiscreteAlphabet abc, double[][] pfm)
abc - the ComplementableDiscreteAlphabet that is used to create the reverse complementpfm - the absolute frequencies for each position and symbol; ![$pfm[l][a] := N_{X_\ell=a}$](images/PFMComparator_LaTeXil111_1.png)
public static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromUniprobe(String startdir) throws IOException
ArrayList together with some annotation.startdir - the directory containing the PFMs in Uniprobe formatArrayList containing AbstractMap.SimpleEntrys with annotation and PFMIOException - if something went wrong while reading a filepublic static AbstractMap.SimpleEntry<String,double[][]> readPFMFromUniprobe(String annotPrefix, File file) throws IOException
annotPrefix - a prefix on the PFM's annotationfile - the File to readIOExceptionpublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromJasparFastA(String filename) throws IOException
ArrayList together with some annotation.filename - the path to the file containing the Jaspar PFMsArrayList containing AbstractMap.SimpleEntrys with annotation and PFMIOException - if something went wrong while reading the filepublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromJasparFastA(BufferedReader r) throws IOException
r - the BufferedReader on the Jaspar fileIOException - if the contents of r could not be readpublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromEMBL(String fileName, int max) throws IOException
ArrayList together with some annotation.fileName - the name of the file containing the PFMs in EMBL formatmax - the maximal number of PFMs that should be readArrayList containing AbstractMap.SimpleEntrys with annotation and PFMIOException - if something went wrong while reading the filepublic static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromTransfac(String fileName, int max) throws IOException
fileName - the Transfac filemax - the maximum number of PFMs to readIOException - if the file could not be readpublic static ComparableElement<String,Double>[] find(ComplementableDiscreteAlphabet abc, double[][] pfm, ArrayList<AbstractMap.SimpleEntry<String,double[][]>> knownPFMs, PFMComparator.PFMDistance distance, int minOverlap, int allowMiniShift, boolean pValues, double threshold)
pfm similar PFMs in a list of known PFMs.abc - the alphabet of this PFMpfm - the PFM to be compare to the known PFMsknownPFMs - a ArrayList of known PFMsdistance - a distance measure to assess the similarity of the PFMsminOverlap - the minimal number of consecutive positions in an alignmentallowMiniShift - the minimal number of allowed shifts (even if the minimal overlap is bigger)pValues - compute p-values for matrix similaritythreshold - only hits with a distance below this threshold will be reportedreadPFMsFromEMBL(String, int)