de.jstacs.models
Interface Model

All Superinterfaces:
Cloneable, Storable
All Known Implementing Classes:
AbstractHMM, AbstractMixtureModel, AbstractModel, BayesianNetworkModel, CompositeModel, DAGModel, DifferentiableHigherOrderHMM, DiscreteGraphicalModel, FSDAGModel, FSDAGModelForGibbsSampling, HiddenMotifMixture, HigherOrderHMM, HomogeneousMM, HomogeneousModel, InhomogeneousDGM, MixtureModel, NormalizableScoringFunctionModel, SamplingHigherOrderHMM, SamplingPhyloHMM, SharedStructureMixture, SingleHiddenMotifMixture, StrandModel, UniformModel, VariableLengthWrapperModel

public interface Model
extends Cloneable, Storable

This interface defines all methods for a probabilistic model.

Author:
Andre Gohr, Jan Grau, Jens Keilwagen

Method Summary
 Model clone()
          Creates a clone (deep copy) of the current Model instance.
 Sample emitSample(int numberOfSequences, int... seqLength)
          This method returns a Sample object containing artificial sequence(s).
 AlphabetContainer getAlphabetContainer()
          Returns the container of alphabets that were used when constructing the model.
 ResultSet getCharacteristics()
          Returns some information characterizing or describing the current instance of the model.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 int getLength()
          Returns the length of sequences this model can classify.
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double[] getLogProbFor(Sample data)
          This method computes the logarithm of the probabilities of all sequences in the given sample.
 void getLogProbFor(Sample data, double[] res)
          This method computes and stores the logarithm of the probabilities for any sequence in the sample in the given double-array.
 double getLogProbFor(Sequence sequence)
          Returns the logarithm of the probability of the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by getCharacteristics().
 double getPriorTerm()
          Returns a value that is proportional to the prior.
 double getProbFor(Sequence sequence)
          Returns the probability of the given sequence given the model.
 double getProbFor(Sequence sequence, int startpos)
          Returns the probability of (a part of) the given sequence given the model.
 double getProbFor(Sequence sequence, int startpos, int endpos)
          Returns the probability of (a part of) the given sequence given the model.
 boolean isTrained()
          Returns true if the model has been trained successfully, false otherwise.
 boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
          This method tries to set a new instance of an AlphabetContainer for the current model.
 String toString()
          Should give a simple representation (text) of the model as String .
 void train(Sample data)
          Trains the Model object given the data as Sample.
 void train(Sample data, double[] weights)
          Trains the Model object given the data as Sample using the specified weights.
 
Methods inherited from interface de.jstacs.Storable
toXML
 

Method Detail

clone

Model clone()
            throws CloneNotSupportedException
Creates a clone (deep copy) of the current Model instance.

Returns:
the cloned instance
Throws:
CloneNotSupportedException - if something went wrong while cloning

train

void train(Sample data)
           throws Exception
Trains the Model object given the data as Sample.
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences as Sample
Throws:
Exception - if the training did not succeed
See Also:
Sample.getElementAt(int), Sample.ElementEnumerator

train

void train(Sample data,
           double[] weights)
           throws Exception
Trains the Model object given the data as Sample using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the sample as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences as Sample
weights - the weights of the elements, each weight should be non-negative
Throws:
Exception - if the training did not succeed (e.g. the dimension of weights and the number of sequences in the sample do not match)
See Also:
Sample.getElementAt(int), Sample.ElementEnumerator

getProbFor

double getProbFor(Sequence sequence)
                  throws NotTrainedException,
                         Exception
Returns the probability of the given sequence given the model. If at least one random variable is continuous the value of the density function is returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence for which the probability/the value of the density function should be returned
Returns:
the probability or the value of the density function of the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet

getProbFor

double getProbFor(Sequence sequence,
                  int startpos)
                  throws NotTrainedException,
                         Exception
Returns the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

If the length of the sequences, whose probability should be returned, is fixed (e.g. in a inhomogeneous model) and the given sequence is longer than their fixed length, the start position within the given sequence is given by startpos. E.g. the fixed length is 12. The length of the given sequence is 30 and the startpos=15 the probability of the part from position 15 to 26 (inclusive) given the model should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
Returns:
the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet

getProbFor

double getProbFor(Sequence sequence,
                  int startpos,
                  int endpos)
                  throws NotTrainedException,
                         Exception
Returns the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method getProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled (e.g. startpos > endpos, endpos > sequence.length, ...) by the model
NotTrainedException - if the model is not trained yet

getLogProbFor

double getLogProbFor(Sequence sequence,
                     int startpos,
                     int endpos)
                     throws Exception
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

For more details see getProbFor(Sequence, int, int)

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model
NotTrainedException - if the model is not trained yet
See Also:
getProbFor(Sequence, int, int)

getLogProbFor

double getLogProbFor(Sequence sequence,
                     int startpos)
                     throws Exception
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

For more details see getProbFor(Sequence, int)

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet
See Also:
getProbFor(Sequence, int)

getLogProbFor

double getLogProbFor(Sequence sequence)
                     throws Exception
Returns the logarithm of the probability of the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

For more details see getProbFor(Sequence)

Parameters:
sequence - the given sequence for which the logarithm of the probability/the value of the density function should be returned
Returns:
the logarithm of the probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - if the sequence could not be handled by the model
NotTrainedException - if the model is not trained yet
See Also:
getProbFor(Sequence)

getLogProbFor

double[] getLogProbFor(Sample data)
                       throws Exception
This method computes the logarithm of the probabilities of all sequences in the given sample. The values are stored in an array according to the index of the respective sequence in the sample.

The probability for any sequence shall be computed independent of all other sequences in the sample. So the result should be exactly the same as for the method getLogProbFor(Sequence).

Parameters:
data - the sample of sequences
Returns:
an array containing the logarithm of the probabilities of all sequences of the sample
Throws:
Exception - if something went wrong
See Also:
getLogProbFor(Sequence)

getLogProbFor

void getLogProbFor(Sample data,
                   double[] res)
                   throws Exception
This method computes and stores the logarithm of the probabilities for any sequence in the sample in the given double-array.

The probability for any sequence shall be computed independent of all other sequences in the sample. So the result should be exactly the same as for the method getLogProbFor(Sequence).

Parameters:
data - the sample of sequences
res - the array for the results, has to have length data.getNumberOfElements() (which returns the number of sequences in the sample)
Throws:
Exception - if something went wrong
See Also:
getLogProbFor(Sample)

getPriorTerm

double getPriorTerm()
                    throws Exception
Returns a value that is proportional to the prior. For ML 1 should be returned.

Returns:
a value that is proportional to the prior
Throws:
Exception - if something went wrong

getLogPriorTerm

double getLogPriorTerm()
                       throws Exception
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong
See Also:
getPriorTerm()

emitSample

Sample emitSample(int numberOfSequences,
                  int... seqLength)
                  throws NotTrainedException,
                         Exception
This method returns a Sample object containing artificial sequence(s).

There are two different possibilities to create a sample for a model with length 0 (homogeneous models).
  1. emitSample( int n, int l ) should return a sample with n sequences of length l.
  2. emitSample( int n, int[] l ) should return a sample with n sequences which have a sequence length corresponding to the entry in the given array l.

There are two different possibilities to create a sample for a model with length greater than 0 (inhomogeneous models).
emitSample( int n ) and emitSample( int n, null ) should return a sample with n sequences of length of the model ( getLength()).

The standard implementation throws an Exception.

Parameters:
numberOfSequences - the number of sequences that should be contained in the returned sample
seqLength - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
a Sample containing the artificial sequence(s)
Throws:
Exception - if the emission did not succeed
NotTrainedException - if the model is not trained yet
See Also:
Sample

getAlphabetContainer

AlphabetContainer getAlphabetContainer()
Returns the container of alphabets that were used when constructing the model.

Returns:
the container of alphabets that were used when constructing the model

getInstanceName

String getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ...

Returns:
a short instance name

getLength

int getLength()
Returns the length of sequences this model can classify. Models that can only classify sequences of defined length are e.g. PWM or inhomogeneous Markov models. If the model can classify sequences of arbitrary length, e.g. homogeneous Markov models, this method returns 0 (zero).

Returns:
the length of sequences the model can classify

getMaximalMarkovOrder

byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
This method returns the maximal used Markov order, if possible.

Returns:
maximal used Markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer

isTrained

boolean isTrained()
Returns true if the model has been trained successfully, false otherwise.

Returns:
true if the model has been trained successfully, false otherwise

getCharacteristics

ResultSet getCharacteristics()
                             throws Exception
Returns some information characterizing or describing the current instance of the model. This could be e.g. the number of edges for a Bayesian network or an image showing some representation of the model. The set of characteristics should always include the XML-representation of the model. The corresponding result type is StorableResult.

Returns:
the characteristics of the current instance of the model
Throws:
Exception - if some of the characteristics could not be defined
See Also:
StorableResult

getNumericalCharacteristics

NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Returns the subset of numerical values that are also returned by getCharacteristics().

Returns:
the numerical characteristics of the current instance of the model
Throws:
Exception - if some of the characteristics could not be defined

toString

String toString()
Should give a simple representation (text) of the model as String .

Overrides:
toString in class Object
Returns:
the representation as String

setNewAlphabetContainerInstance

boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
This method tries to set a new instance of an AlphabetContainer for the current model. This instance has to be consistent with the underlying instance of an AlphabetContainer.

This method can be very useful to save time.

Parameters:
abc - the alphabets in an AlphabetContainer
Returns:
true if the new instance could be set
See Also:
getAlphabetContainer(), AlphabetContainer.checkConsistency(AlphabetContainer)