de.jstacs.models.utils
Class ModelTester

java.lang.Object
  extended by de.jstacs.models.utils.ModelTester

public class ModelTester
extends Object

This class is useful for some test for any (discrete) models. It implements several statistics (loglikelihood, Shannon entropy, AIC, BIC, ...) to compare models.

Author:
Jens Keilwagen
See Also:
AbstractModel

Nested Class Summary
static class ModelTester.SeqIterator
           
 
Constructor Summary
ModelTester()
           
 
Method Summary
static double getKLDivergence(Model m1, Model m2, int length)
          Returns the Kullback-Leibler-divergence D(p_m1||p_m2).
static double getLogLikelihood(Model m, Sample data)
          Returns the loglikelihood of a sample data for a given model m.
static double getLogLikelihood(Model m, Sample data, double[] weights)
          Returns the loglikelihood of a sample data for a given model m.
static double getMarginalDistribution(Model m, int[] constraint)
          This method computes the marginal distribution for any discrete model m and all sequences that fulfil the constraint, if possible.
static double getMaxOfDeviation(Model m1, Model m2, int length)
          This method computes the maximum deviation between the probabilties for the all sequences of length for discrete models m1 and m2.
static Sequence getMostProbableSequence(Model m, int length)
          Returns one most probable sequence for the discrete model m.
static double getShannonEntropy(Model m, int length)
          This method computes the Shannon Entropy for any discrete model m and all sequences of length, if possible.
static double getShannonEntropyInBits(Model m, int length)
          This method computes the Shannon Entropy in bits for any discrete model m and all sequences of length, if possible.
static double getSumOfDeviation(Model m1, Model m2, int length)
          This method computes the sum of deviations between the probabilties for the all sequences of length for discrete models m1 and m2.
static double getSumOfDistribution(Model m, int length)
          This method computes the marginal distribution for any discrete model m and all sequences of length, if possible.
static double getSymKLDivergence(Model m1, Model m2, int length)
          Returns the difference of the Kullback-Leibler-divergences, i.e.
static double getValueOfAIC(Model m, Sample s, int k)
          This method computes the value of Akaikes Information Criterion (AIC).
static double getValueOfBIC(Model m, Sample s, int k)
          This method computes the value of Bayesian Information Criterion (BIC).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ModelTester

public ModelTester()
Method Detail

getKLDivergence

public static double getKLDivergence(Model m1,
                                     Model m2,
                                     int length)
                              throws Exception
Returns the Kullback-Leibler-divergence D(p_m1||p_m2).

Computes \sum_x p(x|m1) * \log \frac{p(x|m1)}{p(x|m2)}.

Parameters:
m1 - one discrete model
m2 - another discrete model
length - the length of the sequence (for inhomogeneous models length has to be getLength())
Returns:
the Kullback-Leibler-divergence
Throws:
Exception

getSymKLDivergence

public static double getSymKLDivergence(Model m1,
                                        Model m2,
                                        int length)
                                 throws Exception
Returns the difference of the Kullback-Leibler-divergences, i.e. D(p_m1||p_m2) - D(p_m2||p_m1).

Computes \sum_x (p(x|m1)-p(x|m2)) * \log \frac{p(x|m1)}{p(x|m2)}.

Parameters:
m1 - one discrete model
m2 - another discrete model
length - the length of the sequence (for inhomogeneous models length has to be getLength())
Returns:
the difference of the Kullback-Leibler-divergence
Throws:
Exception

getLogLikelihood

public static double getLogLikelihood(Model m,
                                      Sample data)
                               throws Exception
Returns the loglikelihood of a sample data for a given model m.

Parameters:
m - given model
data - the sample
Returns:
loglikelihood
Throws:
Exception - if something went wrong

getLogLikelihood

public static double getLogLikelihood(Model m,
                                      Sample data,
                                      double[] weights)
                               throws Exception
Returns the loglikelihood of a sample data for a given model m.

Parameters:
m - given model
data - the sample
weights - the weights for each element of the sample
Returns:
loglikelihood
Throws:
Exception - if something went wrong

getMarginalDistribution

public static double getMarginalDistribution(Model m,
                                             int[] constraint)
                                      throws Exception
This method computes the marginal distribution for any discrete model m and all sequences that fulfil the constraint, if possible.

Parameters:
m - a discrete model
constraint - constraint[i] < 0 stands for a irrelavant position, constraint[i] = c with 0 <= c < m.getAlphabets()[(m.getLength==0)?0:i].getAlphabetLength() is the encoded character of position i
Throws:
Exception

getMaxOfDeviation

public static double getMaxOfDeviation(Model m1,
                                       Model m2,
                                       int length)
                                throws Exception
This method computes the maximum deviation between the probabilties for the all sequences of length for discrete models m1 and m2.

Throws:
Exception

getMostProbableSequence

public static Sequence getMostProbableSequence(Model m,
                                               int length)
                                        throws Exception
Returns one most probable sequence for the discrete model m. (Maybe there are more than one most probable sequences. In this case only one is returned.)

This is only a the standard implementation. For some special models like markov models it is possible to compute the sequence much faster by using a dynamic-programming-algorithm.

Parameters:
m - the discrete model
length - the length of the sequence (for inhomogeneous models length has to be getLength())
Returns:
one most probable sequence
Throws:
Exception

getShannonEntropy

public static double getShannonEntropy(Model m,
                                       int length)
                                throws Exception
This method computes the Shannon Entropy for any discrete model m and all sequences of length, if possible.

Throws:
Exception

getShannonEntropyInBits

public static double getShannonEntropyInBits(Model m,
                                             int length)
                                      throws Exception
This method computes the Shannon Entropy in bits for any discrete model m and all sequences of length, if possible.

Throws:
Exception

getSumOfDeviation

public static double getSumOfDeviation(Model m1,
                                       Model m2,
                                       int length)
                                throws Exception
This method computes the sum of deviations between the probabilties for the all sequences of length for discrete models m1 and m2.

Throws:
Exception

getSumOfDistribution

public static double getSumOfDistribution(Model m,
                                          int length)
                                   throws Exception
This method computes the marginal distribution for any discrete model m and all sequences of length, if possible. So this method can be used to give a hind whether a model is a distribution or if some mistakes are in the implementation.

It is expected that this method delivers the value 1.0, but because of the finity of precision in Java the value 1.0 is unrealistic.

Math.abs( 1.0d - getSumOfDistribution( m, length ) should be smaller than 1E-10.

Throws:
Exception

getValueOfAIC

public static double getValueOfAIC(Model m,
                                   Sample s,
                                   int k)
                            throws Exception
This method computes the value of Akaikes Information Criterion (AIC). It uses the formula: AIC = 2 *log L(t,x) - 2*k, where L(t,x) is the likelihood of the sample and k is the number of parameters in the model.

The value of the AIC can be used for model selection.

Parameters:
m - a trained model
s - the sample for the test
k - the number of parameters in the model m
Returns:
value of AIC
Throws:
Exception

getValueOfBIC

public static double getValueOfBIC(Model m,
                                   Sample s,
                                   int k)
                            throws Exception
This method computes the value of Bayesian Information Criterion (BIC). It uses the formula: BIC = 2 *log L(t,x) - k * log n, where L(t,x) is the likelihood of the sample, k is the number of parameters in the model and n is the number of sequences in the Sample.

The value of the BIC can be used for model selection.

Parameters:
m - a trained model
s - the sample for the test
k - the number of parameters in the model m
Returns:
value of AIC
Throws:
Exception