de.jstacs.sequenceScores.statisticalModels
Interface StatisticalModel

All Superinterfaces:
Cloneable, SequenceScore, Storable
All Known Subinterfaces:
DifferentiableStatisticalModel, SamplingDifferentiableStatisticalModel, TrainableStatisticalModel, VariableLengthDiffSM
All Known Implementing Classes:
AbstractDifferentiableStatisticalModel, AbstractHMM, AbstractMixtureDiffSM, AbstractMixtureTrainSM, AbstractTrainableStatisticalModel, AbstractVariableLengthDiffSM, BayesianNetworkDiffSM, BayesianNetworkTrainSM, CompositeTrainSM, CyclicMarkovModelDiffSM, DAGTrainSM, DifferentiableHigherOrderHMM, DifferentiableStatisticalModelWrapperTrainSM, DiscreteGraphicalTrainSM, DurationDiffSM, ExtendedZOOPSDiffSM, FSDAGModelForGibbsSampling, FSDAGTrainSM, FSMEManager, HiddenMotifMixture, HigherOrderHMM, HomogeneousDiffSM, HomogeneousMM, HomogeneousMM0DiffSM, HomogeneousMMDiffSM, HomogeneousTrainSM, IndependentProductDiffSM, InhomogeneousDGTrainSM, MappingDiffSM, MarkovModelDiffSM, MarkovRandomFieldDiffSM, MEManager, MixtureDiffSM, MixtureDurationDiffSM, MixtureTrainSM, NormalizedDiffSM, PositionDiffSM, SamplingHigherOrderHMM, SamplingPhyloHMM, SharedStructureMixture, SkewNormalLikeDurationDiffSM, StrandDiffSM, StrandTrainSM, UniformDiffSM, UniformDurationDiffSM, UniformHomogeneousDiffSM, UniformTrainSM, VariableLengthMixtureDiffSM, VariableLengthWrapperTrainSM, ZOOPSTrainSM

public interface StatisticalModel
extends SequenceScore

This interface declares methods of a statistical model, i.e., a SequenceScore that defines a proper likelihood over the input Sequences. If you like to train the model from a DataSet, please have a look at TrainableStatisticalModel, if you like to use the model in some optimization (e.g., discriminative learning using the GenDisMixClassifier) have a look at DifferentiableStatisticalModel.

Author:
Jan Grau, Jens Keilwagen

Method Summary
 DataSet emitDataSet(int numberOfSequences, int... seqLength)
          This method returns a DataSet object containing artificial sequence(s).
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double getLogProbFor(Sequence sequence)
          Returns the logarithm of the probability of the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore
clone, getAlphabetContainer, getCharacteristics, getInstanceName, getLength, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getNumericalCharacteristics, isInitialized, toString
 
Methods inherited from interface de.jstacs.Storable
toXML
 

Method Detail

getLogProbFor

double getLogProbFor(Sequence sequence,
                     int startpos,
                     int endpos)
                     throws Exception
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method getLogProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally, the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model
NotTrainedException - if the model is not trained yet

getLogProbFor

double getLogProbFor(Sequence sequence,
                     int startpos)
                     throws Exception
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

If the length of the sequences, whose probability should be returned, is fixed (e.g. in a inhomogeneous model) and the given sequence is longer than their fixed length, the start position within the given sequence is given by startpos. E.g. the fixed length is 12. The length of the given sequence is 30 and the startpos=15 the logarithm of the probability of the part from position 15 to 26 (inclusive) given the model should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet
See Also:
getLogProbFor(Sequence, int, int)

getLogProbFor

double getLogProbFor(Sequence sequence)
                     throws Exception
Returns the logarithm of the probability of the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence for which the logarithm of the probability/the value of the density function should be returned
Returns:
the logarithm of the probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet
See Also:
getLogProbFor(Sequence, int, int)

getLogPriorTerm

double getLogPriorTerm()
                       throws Exception
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong

emitDataSet

DataSet emitDataSet(int numberOfSequences,
                    int... seqLength)
                    throws NotTrainedException,
                           Exception
This method returns a DataSet object containing artificial sequence(s).

There are two different possibilities to create a data set for a model with length 0 (homogeneous models).
  1. emitDataSet( int n, int l ) should return a data set with n sequences of length l.
  2. emitDataSet( int n, int[] l ) should return a data set with n sequences which have a sequence length corresponding to the entry in the given array l.

There are two different possibilities to create a data set for a model with length greater than 0 (inhomogeneous models).
emitDataSet( int n ) and emitDataSet( int n, null ) should return a data set with n sequences of length of the model ( SequenceScore.getLength()).

The standard implementation throws an Exception.

Parameters:
numberOfSequences - the number of sequences that should be contained in the returned data set
seqLength - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
a DataSet containing the artificial sequence(s)
Throws:
Exception - if the emission did not succeed
NotTrainedException - if the model is not trained yet
See Also:
DataSet

getMaximalMarkovOrder

byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
This method returns the maximal used Markov order, if possible.

Returns:
maximal used Markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer