StatisticalModelTester

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

de.jstacs.utils
Class StatisticalModelTester

java.lang.Object
  de.jstacs.utils.StatisticalModelTester

public class StatisticalModelTester
extends Object
extends Object

This class is useful for some test for any (discrete) models. It implements several statistics (log-likelihood, Shannon entropy, AIC, BIC, ...) to compare models.

Author:: Jens Keilwagen
See Also:: StatisticalModel

Constructor Summary
`StatisticalModelTester()`

Method Summary
`static double`	`getKLDivergence(StatisticalModel m1, StatisticalModel m2, int length)` Returns the Kullback-Leibler-divergence `D(p_m1\|\|p_m2)`.
`static double`	`getLogLikelihood(StatisticalModel m, DataSet data)` Returns the log-likelihood of a `DataSet` `data` for a given model `m`.
`static double`	`getLogLikelihood(StatisticalModel m, DataSet data, double[] weights)` Returns the log-likelihood of a `DataSet` `data` for a given model `m`.
`static double`	`getMarginalDistribution(StatisticalModel m, int[] constraint)` This method computes the marginal distribution for any discrete model `m` and all sequences that fulfill the `constraint` , if possible.
`static double`	`getMaxOfDeviation(StatisticalModel m1, StatisticalModel m2, int length)` This method computes the maximum deviation between the probabilities for all sequences of `length` for discrete models `m1` and `m2`.
`static Sequence`	`getMostProbableSequence(SequenceScore m, int length)` Returns one most probable sequence for the discrete model `m`.
`static double`	`getShannonEntropy(StatisticalModel m, int length)` This method computes the Shannon entropy for any discrete model `m` and all sequences of `length`, if possible.
`static double`	`getShannonEntropyInBits(StatisticalModel m, int length)` This method computes the Shannon entropy in bits for any discrete model `m` and all sequences of `length`, if possible.
`static double`	`getSumOfDeviation(StatisticalModel m1, StatisticalModel m2, int length)` This method computes the sum of deviations between the probabilities for all sequences of `length` for discrete models `m1` and `m2`.
`static double`	`getSumOfDistribution(StatisticalModel m, int length)` This method computes the marginal distribution for any discrete model `m` and all sequences of `length`, if possible.
`static double`	`getSymKLDivergence(StatisticalModel m1, StatisticalModel m2, int length)` Returns the difference of the Kullback-Leibler-divergences, i.e.
`static double`	`getValueOfAIC(StatisticalModel m, DataSet s, int k)` This method computes the value of Akaikes Information Criterion (AIC).
`static double`	`getValueOfBIC(StatisticalModel m, DataSet s, int k)` This method computes the value of the Bayesian Information Criterion (BIC).

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

StatisticalModelTester

public StatisticalModelTester()

Method Detail

getKLDivergence

public static double getKLDivergence(StatisticalModel m1,
                                     StatisticalModel m2,
                                     int length)
                              throws Exception

Returns the Kullback-Leibler-divergence D(p_m1||p_m2).

Computes \sum_x p(x|m1) * \log \frac{p(x|m1)}{p(x|m2)}.

Parameters:: m1 - one discrete model; m2 - another discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: the Kullback-Leibler-divergence
Throws:: Exception - if something went wrong

getSymKLDivergence

public static double getSymKLDivergence(StatisticalModel m1,
                                        StatisticalModel m2,
                                        int length)
                                 throws Exception

Returns the difference of the Kullback-Leibler-divergences, i.e. D(p_m1||p_m2) - D(p_m2||p_m1).

Computes \sum_x (p(x|m1)-p(x|m2)) * \log \frac{p(x|m1)}{p(x|m2)}.

Parameters:: m1 - one discrete model; m2 - another discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: the difference of the Kullback-Leibler-divergence
Throws:: Exception - if something went wrong

getLogLikelihood

public static double getLogLikelihood(StatisticalModel m,
                                      DataSet data)
                               throws Exception

Returns the log-likelihood of a DataSet data for a given model m.

Parameters:: m - the given model; data - the DataSet
Returns:: the log-likelihood of data
Throws:: Exception - if something went wrong

getLogLikelihood

public static double getLogLikelihood(StatisticalModel m,
                                      DataSet data,
                                      double[] weights)
                               throws Exception

Returns the log-likelihood of a DataSet data for a given model m.

Parameters:: m - the given model; data - the DataSet; weights - the weight for each element of the DataSet
Returns:: the log-likelihood of data
Throws:: Exception - if something went wrong

getMarginalDistribution

public static double getMarginalDistribution(StatisticalModel m,
                                             int[] constraint)
                                      throws Exception

This method computes the marginal distribution for any discrete model m and all sequences that fulfill the constraint , if possible.

Parameters:: m - a discrete model; constraint - constraint[i] < 0 stands for an irrelevant position, constraint[i] = c with 0 <= c < m.getAlphabets()[(m.getLength==0)?0:i].getAlphabetLength() is the encoded character of position i
Returns:: the marginal distribution for a discrete model
Throws:: Exception - if something went wrong

getMaxOfDeviation

public static double getMaxOfDeviation(StatisticalModel m1,
                                       StatisticalModel m2,
                                       int length)
                                throws Exception

This method computes the maximum deviation between the probabilities for all sequences of length for discrete models m1 and m2.

Parameters:: m1 - one discrete model; m2 - another discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: the maximum deviation between the probabilities
Throws:: Exception - if something went wrong

getMostProbableSequence

public static Sequence getMostProbableSequence(SequenceScore m,
                                               int length)
                                        throws Exception

Returns one most probable sequence for the discrete model m. (Maybe there are more than one most probable sequences. In this case only one of them is returned.)

This is only a standard implementation. For some special models like Markov models it is possible to compute the probabilities of the sequences much faster by using a dynamic-programming-algorithm.

Parameters:: m - the discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: one most probable sequence
Throws:: Exception - if something went wrong

getShannonEntropy

public static double getShannonEntropy(StatisticalModel m,
                                       int length)
                                throws Exception

This method computes the Shannon entropy for any discrete model m and all sequences of length, if possible.

Parameters:: m - the discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: the Shannon entropy for a discrete model
Throws:: Exception - if something went wrong

getShannonEntropyInBits

public static double getShannonEntropyInBits(StatisticalModel m,
                                             int length)
                                      throws Exception

This method computes the Shannon entropy in bits for any discrete model m and all sequences of length, if possible.

Parameters:: m - the discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: the Shannon entropy in bits for a discrete model
Throws:: Exception - if something went wrong

getSumOfDeviation

public static double getSumOfDeviation(StatisticalModel m1,
                                       StatisticalModel m2,
                                       int length)
                                throws Exception

This method computes the sum of deviations between the probabilities for all sequences of length for discrete models m1 and m2.

Parameters:: m1 - one discrete model; m2 - another discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: the sum of deviations between the probabilities
Throws:: Exception - if something went wrong

getSumOfDistribution

public static double getSumOfDistribution(StatisticalModel m,
                                          int length)
                                   throws Exception

This method computes the marginal distribution for any discrete model m and all sequences of length, if possible. So this method can be used to give a hint whether a model is a distribution or if some mistakes are in the implementation.

It is expected that this method delivers the value 1.0, but because of the limited precision in Java the value 1.0 is unrealistic.

Math.abs( 1.0d - getSumOfDistribution( m, length ) should be smaller than 1E-10.

Parameters:: m - the discrete model; length - the length of the sequence (for inhomogeneous models length has to be SequenceScore.getLength())
Returns:: the marginal distribution for a discrete model
Throws:: Exception - if something went wrong

getValueOfAIC

public static double getValueOfAIC(StatisticalModel m,
                                   DataSet s,
                                   int k)
                            throws Exception

This method computes the value of Akaikes Information Criterion (AIC). It uses the formula: AIC = 2 * log L(t,x) - 2*k, where L(t,x) is the likelihood of the DataSet and k is the number of parameters in the model.

The value of the AIC can be used for model selection.

Parameters:: m - a trained model; s - the DataSet for the test; k - the number of parameters of the model m
Returns:: the value of AIC
Throws:: Exception - if something went wrong

getValueOfBIC

public static double getValueOfBIC(StatisticalModel m,
                                   DataSet s,
                                   int k)
                            throws Exception

This method computes the value of the Bayesian Information Criterion (BIC). It uses the formula: BIC =

2 * log L(t,x) - k *
 log n

, where L(t,x) is the likelihood of the DataSet, k is the number of parameters in the model and n is the number of sequences in the DataSet.

The value of the BIC can be used for model selection.

Parameters:: m - a trained model; s - the DataSet for the test; k - the number of parameters of the model m
Returns:: value of AIC
Throws:: Exception - if something went wrong

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

de.jstacs.utils Class StatisticalModelTester

StatisticalModelTester

getKLDivergence

getSymKLDivergence

getLogLikelihood

getLogLikelihood

getMarginalDistribution

getMaxOfDeviation

getMostProbableSequence

getShannonEntropy

getShannonEntropyInBits

getSumOfDeviation

getSumOfDistribution

getValueOfAIC

getValueOfBIC

de.jstacs.utils
Class StatisticalModelTester