de.jstacs.models.utils
Class ModelTester

java.lang.Object
  extended by de.jstacs.models.utils.ModelTester

public class ModelTester
extends Object

This class is useful for some test for any (discrete) models. It implements several statistics (log-likelihood, Shannon entropy, AIC, BIC, ...) to compare models.

Author:
Jens Keilwagen
See Also:
AbstractModel

Constructor Summary
ModelTester()
           
 
Method Summary
static double getKLDivergence(Model m1, Model m2, int length)
          Returns the Kullback-Leibler-divergence D(p_m1||p_m2).
static double getLogLikelihood(Model m, Sample data)
          Returns the log-likelihood of a Sample data for a given model m.
static double getLogLikelihood(Model m, Sample data, double[] weights)
          Returns the log-likelihood of a Sample data for a given model m.
static double getMarginalDistribution(Model m, int[] constraint)
          This method computes the marginal distribution for any discrete model m and all sequences that fulfill the constraint , if possible.
static double getMaxOfDeviation(Model m1, Model m2, int length)
          This method computes the maximum deviation between the probabilities for all sequences of length for discrete models m1 and m2.
static Sequence getMostProbableSequence(Model m, int length)
          Returns one most probable sequence for the discrete model m.
static double getShannonEntropy(Model m, int length)
          This method computes the Shannon entropy for any discrete model m and all sequences of length, if possible.
static double getShannonEntropyInBits(Model m, int length)
          This method computes the Shannon entropy in bits for any discrete model m and all sequences of length, if possible.
static double getSumOfDeviation(Model m1, Model m2, int length)
          This method computes the sum of deviations between the probabilities for all sequences of length for discrete models m1 and m2.
static double getSumOfDistribution(Model m, int length)
          This method computes the marginal distribution for any discrete model m and all sequences of length, if possible.
static double getSymKLDivergence(Model m1, Model m2, int length)
          Returns the difference of the Kullback-Leibler-divergences, i.e.
static double getValueOfAIC(Model m, Sample s, int k)
          This method computes the value of Akaikes Information Criterion (AIC).
static double getValueOfBIC(Model m, Sample s, int k)
          This method computes the value of the Bayesian Information Criterion (BIC).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ModelTester

public ModelTester()
Method Detail

getKLDivergence

public static double getKLDivergence(Model m1,
                                     Model m2,
                                     int length)
                              throws Exception
Returns the Kullback-Leibler-divergence D(p_m1||p_m2).

Computes \sum_x p(x|m1) * \log \frac{p(x|m1)}{p(x|m2)}.

Parameters:
m1 - one discrete model
m2 - another discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
the Kullback-Leibler-divergence
Throws:
Exception - if something went wrong

getSymKLDivergence

public static double getSymKLDivergence(Model m1,
                                        Model m2,
                                        int length)
                                 throws Exception
Returns the difference of the Kullback-Leibler-divergences, i.e. D(p_m1||p_m2) - D(p_m2||p_m1).

Computes \sum_x (p(x|m1)-p(x|m2)) * \log \frac{p(x|m1)}{p(x|m2)}.

Parameters:
m1 - one discrete model
m2 - another discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
the difference of the Kullback-Leibler-divergence
Throws:
Exception - if something went wrong

getLogLikelihood

public static double getLogLikelihood(Model m,
                                      Sample data)
                               throws Exception
Returns the log-likelihood of a Sample data for a given model m.

Parameters:
m - the given model
data - the Sample
Returns:
the log-likelihood of data
Throws:
Exception - if something went wrong

getLogLikelihood

public static double getLogLikelihood(Model m,
                                      Sample data,
                                      double[] weights)
                               throws Exception
Returns the log-likelihood of a Sample data for a given model m.

Parameters:
m - the given model
data - the Sample
weights - the weight for each element of the Sample
Returns:
the log-likelihood of data
Throws:
Exception - if something went wrong

getMarginalDistribution

public static double getMarginalDistribution(Model m,
                                             int[] constraint)
                                      throws Exception
This method computes the marginal distribution for any discrete model m and all sequences that fulfill the constraint , if possible.

Parameters:
m - a discrete model
constraint - constraint[i] < 0 stands for an irrelevant position, constraint[i] = c with 0 <= c < m.getAlphabets()[(m.getLength==0)?0:i].getAlphabetLength() is the encoded character of position i
Returns:
the marginal distribution for a discrete model
Throws:
Exception - if something went wrong

getMaxOfDeviation

public static double getMaxOfDeviation(Model m1,
                                       Model m2,
                                       int length)
                                throws Exception
This method computes the maximum deviation between the probabilities for all sequences of length for discrete models m1 and m2.

Parameters:
m1 - one discrete model
m2 - another discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
the maximum deviation between the probabilities
Throws:
Exception - if something went wrong

getMostProbableSequence

public static Sequence getMostProbableSequence(Model m,
                                               int length)
                                        throws Exception
Returns one most probable sequence for the discrete model m. (Maybe there are more than one most probable sequences. In this case only one of them is returned.)

This is only a standard implementation. For some special models like Markov models it is possible to compute the probabilities of the sequences much faster by using a dynamic-programming-algorithm.

Parameters:
m - the discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
one most probable sequence
Throws:
Exception - if something went wrong

getShannonEntropy

public static double getShannonEntropy(Model m,
                                       int length)
                                throws Exception
This method computes the Shannon entropy for any discrete model m and all sequences of length, if possible.

Parameters:
m - the discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
the Shannon entropy for a discrete model
Throws:
Exception - if something went wrong

getShannonEntropyInBits

public static double getShannonEntropyInBits(Model m,
                                             int length)
                                      throws Exception
This method computes the Shannon entropy in bits for any discrete model m and all sequences of length, if possible.

Parameters:
m - the discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
the Shannon entropy in bits for a discrete model
Throws:
Exception - if something went wrong

getSumOfDeviation

public static double getSumOfDeviation(Model m1,
                                       Model m2,
                                       int length)
                                throws Exception
This method computes the sum of deviations between the probabilities for all sequences of length for discrete models m1 and m2.

Parameters:
m1 - one discrete model
m2 - another discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
the sum of deviations between the probabilities
Throws:
Exception - if something went wrong

getSumOfDistribution

public static double getSumOfDistribution(Model m,
                                          int length)
                                   throws Exception
This method computes the marginal distribution for any discrete model m and all sequences of length, if possible. So this method can be used to give a hint whether a model is a distribution or if some mistakes are in the implementation.

It is expected that this method delivers the value 1.0, but because of the limited precision in Java the value 1.0 is unrealistic.

Math.abs( 1.0d - getSumOfDistribution( m, length ) should be smaller than 1E-10.

Parameters:
m - the discrete model
length - the length of the sequence (for inhomogeneous models length has to be Model.getLength())
Returns:
the marginal distribution for a discrete model
Throws:
Exception - if something went wrong

getValueOfAIC

public static double getValueOfAIC(Model m,
                                   Sample s,
                                   int k)
                            throws Exception
This method computes the value of Akaikes Information Criterion (AIC). It uses the formula: AIC = 2 * log L(t,x) - 2*k, where L(t,x) is the likelihood of the Sample and k is the number of parameters in the model.

The value of the AIC can be used for model selection.

Parameters:
m - a trained model
s - the Sample for the test
k - the number of parameters of the model m
Returns:
the value of AIC
Throws:
Exception - if something went wrong

getValueOfBIC

public static double getValueOfBIC(Model m,
                                   Sample s,
                                   int k)
                            throws Exception
This method computes the value of the Bayesian Information Criterion (BIC). It uses the formula: BIC = 2 * log L(t,x) - k * log n, where L(t,x) is the likelihood of the Sample, k is the number of parameters in the model and n is the number of sequences in the Sample.

The value of the BIC can be used for model selection.

Parameters:
m - a trained model
s - the Sample for the test
k - the number of parameters of the model m
Returns:
value of AIC
Throws:
Exception - if something went wrong