public class StatisticalModelTester extends Object
StatisticalModel
Constructor and Description |
---|
StatisticalModelTester() |
Modifier and Type | Method and Description |
---|---|
static double |
getKLDivergence(StatisticalModel m1,
StatisticalModel m2,
int length)
Returns the Kullback-Leibler-divergence
D(p_m1||p_m2) . |
static double |
getLogLikelihood(StatisticalModel m,
DataSet data)
|
static double |
getLogLikelihood(StatisticalModel m,
DataSet data,
double[] weights)
|
static double[] |
getMarginalDistribution(StatisticalModel m,
int[]... constraint)
This method computes the marginal distribution for any discrete model
m and all sequences that fulfill the constraint
, if possible. |
static double |
getMaxOfDeviation(StatisticalModel m1,
StatisticalModel m2,
int length)
This method computes the maximum deviation between the probabilities for
all sequences of
length for discrete models m1
and m2 . |
static Sequence |
getMostProbableSequence(SequenceScore m,
int length)
Returns one most probable sequence for the discrete model
m . |
static double |
getShannonEntropy(StatisticalModel m,
int length)
This method computes the Shannon entropy for any discrete model
m and all sequences of length , if possible. |
static double |
getShannonEntropyInBits(StatisticalModel m,
int length)
This method computes the Shannon entropy in bits for any discrete model
m and all sequences of length , if possible. |
static double |
getSumOfDeviation(StatisticalModel m1,
StatisticalModel m2,
int length)
This method computes the sum of deviations between the probabilities for
all sequences of
length for discrete models m1
and m2 . |
static double |
getSumOfDistribution(StatisticalModel m,
int length)
This method computes the marginal distribution for any discrete model
m and all sequences of length , if possible. |
static double |
getSymKLDivergence(StatisticalModel m1,
StatisticalModel m2,
int length)
Returns the difference of the Kullback-Leibler-divergences, i.e.
|
static double |
getValueOfAIC(StatisticalModel m,
DataSet s,
int k)
This method computes the value of Akaikes Information Criterion (AIC).
|
static double |
getValueOfBIC(StatisticalModel m,
DataSet s,
int k)
This method computes the value of the Bayesian Information Criterion
(BIC).
|
public static double getKLDivergence(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
D(p_m1||p_m2)
.
\sum_x p(x|m1) * \log \frac{p(x|m1)}{p(x|m2)}
.m1
- one discrete modelm2
- another discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static double getSymKLDivergence(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
D(p_m1||p_m2) - D(p_m2||p_m1)
.
\sum_x (p(x|m1)-p(x|m2)) * \log \frac{p(x|m1)}{p(x|m2)}
.m1
- one discrete modelm2
- another discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static double getLogLikelihood(StatisticalModel m, DataSet data) throws Exception
public static double getLogLikelihood(StatisticalModel m, DataSet data, double[] weights) throws Exception
public static double[] getMarginalDistribution(StatisticalModel m, int[]... constraint) throws Exception
m
and all sequences that fulfill the constraint
, if possible.m
- a discrete modelconstraint
- constraint[j][i] < 0
stands for an irrelevant
position, constraint[j][i] = c
with
0 <= c < m.getAlphabets()[(m.getLength==0)?0:i].getAlphabetLength()
is the encoded character of position i
Exception
- if something went wrongpublic static double getMaxOfDeviation(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
length
for discrete models m1
and m2
.m1
- one discrete modelm2
- another discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static Sequence getMostProbableSequence(SequenceScore m, int length) throws Exception
m
.
(Maybe there are more than one most probable sequences. In this case only
one of them is returned.)
m
- the discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static double getShannonEntropy(StatisticalModel m, int length) throws Exception
m
and all sequences of length
, if possible.m
- the discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static double getShannonEntropyInBits(StatisticalModel m, int length) throws Exception
m
and all sequences of length
, if possible.m
- the discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static double getSumOfDeviation(StatisticalModel m1, StatisticalModel m2, int length) throws Exception
length
for discrete models m1
and m2
.m1
- one discrete modelm2
- another discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static double getSumOfDistribution(StatisticalModel m, int length) throws Exception
m
and all sequences of length
, if possible. So
this method can be used to give a hint whether a model is a distribution
or if some mistakes are in the implementation.
Math.abs( 1.0d - getSumOfDistribution( m, length )
should be
smaller than 1E-10
.m
- the discrete modellength
- the length of the sequence (for inhomogeneous models length
has to be SequenceScore.getLength()
)Exception
- if something went wrongpublic static double getValueOfAIC(StatisticalModel m, DataSet s, int k) throws Exception
2 * log L(t,x) - 2*k
, where
L(t,x)
is the likelihood of the DataSet
and
k
is the number of parameters in the model.
public static double getValueOfBIC(StatisticalModel m, DataSet s, int k) throws Exception
2 * log L(t,x) - k *
log n
, where L(t,x)
is the likelihood of the
DataSet
, k
is the number of parameters in the model
and n
is the number of sequences in the DataSet
.