public class StatisticalModelTester extends Object
StatisticalModel| Constructor and Description |
|---|
StatisticalModelTester() |
| Modifier and Type | Method and Description |
|---|---|
static double |
getKLDivergence(StatisticalModel m1,
StatisticalModel m2,
int length)
Returns the Kullback-Leibler-divergence
D(p_m1||p_m2). |
static double |
getLogLikelihood(StatisticalModel m,
DataSet data)
|
static double |
getLogLikelihood(StatisticalModel m,
DataSet data,
double[] weights)
|
static double[] |
getMarginalDistribution(StatisticalModel m,
int[]... constraint)
This method computes the marginal distribution for any discrete model
m and all sequences that fulfill the constraint
, if possible. |
static double |
getMaxOfDeviation(StatisticalModel m1,
StatisticalModel m2,
int length)
This method computes the maximum deviation between the probabilities for
all sequences of
length for discrete models m1
and m2. |
static Sequence |
getMostProbableSequence(SequenceScore m,
int length)
Returns one most probable sequence for the discrete model
m. |
static double |
getShannonEntropy(StatisticalModel m,
int length)
This method computes the Shannon entropy for any discrete model
m and all sequences of length, if possible. |
static double |
getShannonEntropyInBits(StatisticalModel m,
int length)
This method computes the Shannon entropy in bits for any discrete model
m and all sequences of length, if possible. |
static double |
getSumOfDeviation(StatisticalModel m1,
StatisticalModel m2,
int length)
This method computes the sum of deviations between the probabilities for
all sequences of
length for discrete models m1
and m2. |
static double |
getSumOfDistribution(StatisticalModel m,
int length)
This method computes the marginal distribution for any discrete model
m and all sequences of length, if possible. |
static double |
getSymKLDivergence(StatisticalModel m1,
StatisticalModel m2,
int length)
Returns the difference of the Kullback-Leibler-divergences, i.e.
|
static double |
getValueOfAIC(StatisticalModel m,
DataSet s,
int k)
This method computes the value of Akaikes Information Criterion (AIC).
|
static double |
getValueOfBIC(StatisticalModel m,
DataSet s,
int k)
This method computes the value of the Bayesian Information Criterion
(BIC).
|
public static double getKLDivergence(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
D(p_m1||p_m2).
\sum_x p(x|m1) * \log \frac{p(x|m1)}{p(x|m2)}.m1 - one discrete modelm2 - another discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static double getSymKLDivergence(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
D(p_m1||p_m2) - D(p_m2||p_m1).
\sum_x (p(x|m1)-p(x|m2)) * \log \frac{p(x|m1)}{p(x|m2)}.m1 - one discrete modelm2 - another discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static double getLogLikelihood(StatisticalModel m, DataSet data) throws Exception
public static double getLogLikelihood(StatisticalModel m, DataSet data, double[] weights) throws Exception
public static double[] getMarginalDistribution(StatisticalModel m, int[]... constraint) throws Exception
m and all sequences that fulfill the constraint
, if possible.m - a discrete modelconstraint - constraint[j][i] < 0 stands for an irrelevant
position, constraint[j][i] = c with
0 <= c < m.getAlphabets()[(m.getLength==0)?0:i].getAlphabetLength()
is the encoded character of position iException - if something went wrongpublic static double getMaxOfDeviation(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
length for discrete models m1
and m2.m1 - one discrete modelm2 - another discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static Sequence getMostProbableSequence(SequenceScore m, int length) throws Exception
m.
(Maybe there are more than one most probable sequences. In this case only
one of them is returned.)
m - the discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static double getShannonEntropy(StatisticalModel m, int length) throws Exception
m and all sequences of length, if possible.m - the discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static double getShannonEntropyInBits(StatisticalModel m, int length) throws Exception
m and all sequences of length, if possible.m - the discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static double getSumOfDeviation(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
length for discrete models m1
and m2.m1 - one discrete modelm2 - another discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static double getSumOfDistribution(StatisticalModel m, int length) throws Exception
m and all sequences of length, if possible. So
this method can be used to give a hint whether a model is a distribution
or if some mistakes are in the implementation.
Math.abs( 1.0d - getSumOfDistribution( m, length ) should be
smaller than 1E-10.m - the discrete modellength - the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength())Exception - if something went wrongpublic static double getValueOfAIC(StatisticalModel m, DataSet s, int k) throws Exception
2 * log L(t,x) - 2*k, where
L(t,x) is the likelihood of the DataSet and
k is the number of parameters in the model.
public static double getValueOfBIC(StatisticalModel m, DataSet s, int k) throws Exception
2 * log L(t,x) - k *
log n, where L(t,x) is the likelihood of the
DataSet, k is the number of parameters in the model
and n is the number of sequences in the DataSet.