de.jstacs.utils
Class PFMComparator

java.lang.Object
  extended by de.jstacs.utils.PFMComparator

public class PFMComparator
extends Object

This class implements a number of methods for the comparison of position frequency matrices (PFMs) as described in the Amadeus paper

Author:
Jens Keilwagen

Nested Class Summary
static class PFMComparator.NormalizedEuclideanDistance
          This class implements the normalized Euclidean distance.
static class PFMComparator.OneMinusPearsonCorrelationCoefficient
          This class implements the Pearson correlation coefficient.
static class PFMComparator.PFMDistance
          This interface declares a method for comparing different PFMs.
static class PFMComparator.SymmetricKullbackLeiblerDivergence
          This class implements the symmetric Kullback-Leibler-divergence.
 
Constructor Summary
PFMComparator()
           
 
Method Summary
static ComparableElement<String,Double>[] find(ComplementableDiscreteAlphabet abc, double[][] pfm, ArrayList<AbstractMap.SimpleEntry<String,double[][]>> knownPFMs, PFMComparator.PFMDistance distance, int minOverlap, int allowMiniShift, boolean pValues, double threshold)
          This methods finds for a user specified PFM pfm similar PFMs in a list of known PFMs.
static String getConsensus(AlphabetContainer con, double[][] pfm)
          This method extracts the The method does not use any degenerated IUPAC code.
static double[] getCounts(DataSet... data)
          This method counts the occurrences of symbols in the given data sets.
static double[][] getPFM(DataSet data)
          This method creates a PFM from a DataSet of Sequences.
static double[][] getPFM(DataSet data, double[] weights)
          This method creates a PFM from a DataSet of Sequences.
static double[][] getReverseComplement(ComplementableDiscreteAlphabet abc, double[][] pfm)
          This method returns the PFM that is the reverse complement of the given PFM.
static String matrixToString(double[][] matrix)
          Returns a string representation of the matrix, where each row of the matrix is printed on one line and columns are separated by tabstops.
static void normalize(double[] counts)
          This method enables the user to normalize a array containing counts.
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromEMBL(String fileName, int max)
          This method reads a number of PFMs from a file and return them in an ArrayList together with some annotation.
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromJasparFastA(String filename)
          This method reads a number of PFMs from a file in the Jaspar FastA-like format and returns them in an ArrayList together with some annotation.
static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromUniprobe(String startdir)
          This method reads a number of PFMs from a directory as downloaded from the Uniprobe data base and returns them in an ArrayList together with some annotation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PFMComparator

public PFMComparator()
Method Detail

matrixToString

public static String matrixToString(double[][] matrix)
Returns a string representation of the matrix, where each row of the matrix is printed on one line and columns are separated by tabstops.

Parameters:
matrix - the matrix
Returns:
the string representation

getConsensus

public static String getConsensus(AlphabetContainer con,
                                  double[][] pfm)
This method extracts the The method does not use any degenerated IUPAC code.

Parameters:
con - the AlphabetContainer that was used to create the PFM
pfm - the position frequency matrix (or also possible the position weight matrix)
Returns:
the consensus which is the string with most probable symbol at each position

getCounts

public static double[] getCounts(DataSet... data)
This method counts the occurrences of symbols in the given data sets. This method can therefore be used to create a background distribution.

Parameters:
data - the data sets
Returns:
an array containing the counts

normalize

public static void normalize(double[] counts)
This method enables the user to normalize a array containing counts. After using this method the array contain probabilities.

Parameters:
counts - the array of counts

getPFM

public static double[][] getPFM(DataSet data)
This method creates a PFM from a DataSet of Sequences.

Parameters:
data - the DataSet
Returns:
the PFM

getPFM

public static double[][] getPFM(DataSet data,
                                double[] weights)
This method creates a PFM from a DataSet of Sequences.

Parameters:
data - the DataSet
weights - the weights on the sequences in data
Returns:
the PFM

getReverseComplement

public static double[][] getReverseComplement(ComplementableDiscreteAlphabet abc,
                                              double[][] pfm)
This method returns the PFM that is the reverse complement of the given PFM.

Parameters:
abc - the ComplementableDiscreteAlphabet that is used to create the reverse complement
pfm - the absolute frequencies for each position and symbol; $pfm[l][a] := N_{X_\ell=a}$
Returns:
the PFM that is the reverse complement of the given PFM

readPFMsFromUniprobe

public static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromUniprobe(String startdir)
                                                                                  throws IOException
This method reads a number of PFMs from a directory as downloaded from the Uniprobe data base and returns them in an ArrayList together with some annotation.

Parameters:
startdir - the directory containing the PFMs in Uniprobe format
Returns:
an ArrayList containing AbstractMap.SimpleEntrys with annotation and PFM
Throws:
IOException - if something went wrong while reading a file

readPFMsFromJasparFastA

public static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromJasparFastA(String filename)
                                                                                     throws IOException
This method reads a number of PFMs from a file in the Jaspar FastA-like format and returns them in an ArrayList together with some annotation.

Parameters:
filename - the path to the file containing the Jaspar PFMs
Returns:
an ArrayList containing AbstractMap.SimpleEntrys with annotation and PFM
Throws:
IOException - if something went wrong while reading the file

readPFMsFromEMBL

public static ArrayList<AbstractMap.SimpleEntry<String,double[][]>> readPFMsFromEMBL(String fileName,
                                                                                     int max)
                                                                              throws IOException
This method reads a number of PFMs from a file and return them in an ArrayList together with some annotation.

Parameters:
fileName - the name of the file containing the PFMs in EMBL format
max - the maximal number of PFMs that should be read
Returns:
an ArrayList containing AbstractMap.SimpleEntrys with annotation and PFM
Throws:
IOException - if something went wrong while reading the file

find

public static ComparableElement<String,Double>[] find(ComplementableDiscreteAlphabet abc,
                                                      double[][] pfm,
                                                      ArrayList<AbstractMap.SimpleEntry<String,double[][]>> knownPFMs,
                                                      PFMComparator.PFMDistance distance,
                                                      int minOverlap,
                                                      int allowMiniShift,
                                                      boolean pValues,
                                                      double threshold)
This methods finds for a user specified PFM pfm similar PFMs in a list of known PFMs.

Parameters:
abc - the alphabet of this PFM
pfm - the PFM to be compare to the known PFMs
knownPFMs - a ArrayList of known PFMs
distance - a distance measure to assess the similarity of the PFMs
minOverlap - the minimal number of consecutive positions in an alignment
allowMiniShift - the minimal number of allowed shifts (even if the minimal overlap is bigger)
pValues - compute p-values for matrix similarity
threshold - only hits with a distance below this threshold will be reported
Returns:
an array of similar PFMs and their annotations
See Also:
readPFMsFromEMBL(String, int)