de.jstacs.models
Interface Model

All Superinterfaces:
Cloneable, Storable
All Known Implementing Classes:
AbstractMixtureModel, AbstractModel, BayesianNetworkModel, CompositeModel, DAGModel, DiscreteGraphicalModel, FSDAGModel, FSDAGModelForGibbsSampling, InhomogeneousDGM, MixtureModel, SharedStructureMixture, StrandModel, UniformModel

public interface Model
extends Cloneable, Storable

This interface defines all methods for a probabilistic model.

Author:
Andre Gohr, Jan Grau, Jens Keilwagen

Method Summary
 Model clone()
          Creates a clone (deep copy) of the current Model instance.
 Sample emitSample(int numberOfSequences, int... seqLength)
          This method returns a Sample object containing artificial sequence(s).
 AlphabetContainer getAlphabetContainer()
          Returns the container of alphabets that were used when constructing the model.
 ResultSet getCharacteristics()
          Returns some information characterizing or describing the current instance of the model.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 int getLength()
          Returns the length of sequence this model can classify.
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double[] getLogProbFor(Sample data)
          This method computes the logarithm of the probabilities of all sequences in the given sample.
 void getLogProbFor(Sample data, double[] res)
          This method computes and stores the logarithm of the probabilities for any sequence in the sample in the given double array.
 double getLogProbFor(Sequence sequence)
          Returns the logarithm of the probability of the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos)
          Returns the logarithm of the probability of the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used markov order if possible.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by getCharacteristsics.
 double getPriorTerm()
          Returns a value that is proportional to the prior.
 double getProbFor(Sequence sequence)
          Returns the probability of the given sequence given the model.
 double getProbFor(Sequence sequence, int startpos)
          Returns the probability of the given sequence given the model.
 double getProbFor(Sequence sequence, int startpos, int endpos)
          Returns the probability of the given sequence given the model.
 boolean isTrained()
          Returns true if the model has been trained successfully, false otherwise.
 boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
          This method tries to set a new instance of an AlphabetContainer for the current model.
 String toString()
          Should give a simple representation (text) of the model as String.
 void train(Sample data)
          Trains the AbstractModel object given the data as Sample.
 void train(Sample data, double[] weights)
          Trains the Model object given the data as Sample using the specified weights.
 
Methods inherited from interface de.jstacs.Storable
toXML
 

Method Detail

clone

Model clone()
            throws CloneNotSupportedException
Creates a clone (deep copy) of the current Model instance.

Returns:
the cloned instance
Throws:
CloneNotSupportedException

train

void train(Sample data)
           throws Exception
Trains the AbstractModel object given the data as Sample.
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences as Sample
Throws:
Exception - an Exception should be thrown if the training did not succeed.
See Also:
Sample.getElementAt(int), Sample.ElementEnumerator

train

void train(Sample data,
           double[] weights)
           throws Exception
Trains the Model object given the data as Sample using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the sample as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences
weights - the weights of the elements, each weight should be non-negative
Throws:
Exception - an Exception should be thrown if the training did not succeed (e.g. the weights dimension of weights and number of samples does not match).
See Also:
Sample.getElementAt(int), Sample.ElementEnumerator

getProbFor

double getProbFor(Sequence sequence)
                  throws NotTrainedException,
                         Exception
Returns the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.
The length and alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the sequence
Returns:
the probability or the value of the density function of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.

getProbFor

double getProbFor(Sequence sequence,
                  int startpos)
                  throws NotTrainedException,
                         Exception
Returns the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

If the length of the sequences, whose probability should be returned, is fixed (e.g. in a inhomogenous model) and the given sequence is longer than their fixed length, the start position within the given sequence is given by startpos. E.g. the fixed length is 12. The length of the given sequence is 30 and the startpos=15 the probability of the part from position 15 to 26 (inclusive) given the model should be returned.
The length and alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the sequence
startpos - the start position
Returns:
the probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.

getProbFor

double getProbFor(Sequence sequence,
                  int startpos,
                  int endpos)
                  throws NotTrainedException,
                         Exception
Returns the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method getProbFor(Sequence sequence, int startpos) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the sequence
startpos - the start position
endpos - the last position to be taken into account
Returns:
the probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled (e.g. startpos > endpos, endpos > sequence.length, ...) by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.

getLogProbFor

double getLogProbFor(Sequence sequence,
                     int startpos,
                     int endpos)
                     throws Exception
Returns the logarithm of the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

For more details see getProbFor(Sequence, int, int)

Parameters:
sequence - the sequence
startpos - the start position
endpos - the last position to be taken into account
Returns:
the logarithm of probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled (e.g. startpos > endpos, endpos > sequence.length, ...) by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
See Also:
getProbFor(Sequence, int, int)

getLogProbFor

double getLogProbFor(Sequence sequence,
                     int startpos)
                     throws Exception
Returns the logarithm of the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

For more details see getProbFor(Sequence, int)

Parameters:
sequence - the sequence
startpos - the start position
Returns:
the logarithm of probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
See Also:
getProbFor(Sequence, int)

getLogProbFor

double getLogProbFor(Sequence sequence)
                     throws Exception
Returns the logarithm of the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

For more details see getProbFor(Sequence)

Parameters:
sequence - the sequence
Returns:
the logarithm of probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
See Also:
getProbFor(Sequence)

getLogProbFor

double[] getLogProbFor(Sample data)
                       throws Exception
This method computes the logarithm of the probabilities of all sequences in the given sample. The values are stored in the array according to the index of the sequence in the sample.

The probability for any sequence shall be computed independent of all other sequences in the sample. So the result should be exactly the same as for the method getLogProbFor(Sequence)

Parameters:
data - the sample
Returns:
an array containing the logarithm of the probabilities of all sequences of the sample
Throws:
Exception - if something went wrong
See Also:
getLogProbFor(Sequence)

getLogProbFor

void getLogProbFor(Sample data,
                   double[] res)
                   throws Exception
This method computes and stores the logarithm of the probabilities for any sequence in the sample in the given double array.

The probability for any sequence shall be computed independent of all other sequences in the sample. So the result should be exactly the same as for the method getLogProbFor(Sequence)

Parameters:
data - the sample
res - the array for the results, has to have length data.getNumberOfElements()
Throws:
Exception - if something went wrong
See Also:
getLogProbFor(Sample)

getPriorTerm

double getPriorTerm()
                    throws Exception
Returns a value that is proportional to the prior. For ML 1 should be returned.

Returns:
a value that is proportional to the prior
Throws:
Exception - if something went wrong

getLogPriorTerm

double getLogPriorTerm()
                       throws Exception
Returns a value that is proportional to the log of the prior. For ML 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong
See Also:
getPriorTerm()

emitSample

Sample emitSample(int numberOfSequences,
                  int... seqLength)
                  throws NotTrainedException,
                         Exception
This method returns a Sample object containing artificial sequence(s).

There are 2 different possibilities to create a sample for a model with length 0.
  1. emitSample( int n, int l ) should return a sample with n sequences of length l.
  2. emitSample( int n, int[] l ) should return a sample with n sequences which have a sequence length corresponding to the entry in the array


There are 2 different possibilities to create a sample for a model with length greater than 0. emitSample( int n ) and emitSample( int n, null ) should return a sample with n sequences of length of the model (getLength())

The standard implementation throws an Exception.

Parameters:
numberOfSequences - the number of sequences that should be contained in the returned sample
seqLength - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
Sample containing the artificial sequence(s)
Throws:
Exception - an Exception should be thrown if the emission did not succeed.
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
See Also:
Sample

getAlphabetContainer

AlphabetContainer getAlphabetContainer()
Returns the container of alphabets that were used when constructing the model.

Returns:
the alphabet

getInstanceName

String getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ...

Returns:
a short instance name

getLength

int getLength()
Returns the length of sequence this model can classify. Models that can only classify sequences of defined length are e.g. PWM or inhomogeneous Markov models. If the model can classify sequences of arbitrary length, e.g. homogeneous Markov models, this method returns 0 (zero).

Returns:
the length

getMaximalMarkovOrder

byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
This method returns the maximal used markov order if possible.

Returns:
maximal used markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer

isTrained

boolean isTrained()
Returns true if the model has been trained successfully, false otherwise.

Returns:
true if the model has been trained successfully, false otherwise.

getCharacteristics

ResultSet getCharacteristics()
                             throws Exception
Returns some information characterizing or describing the current instance of the model. This could be e.g. the number of edges for a Bayesian network or an image showing some representation of the model. The set of characteristics should always include the XML-representation of the model. The corresponding result type is ObjectResult

Returns:
the characteristics
Throws:
Exception - an Exception is thrown if some of the characteristics could not be defined
See Also:
StorableResult

getNumericalCharacteristics

NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Returns the subset of numerical values that are also returned by getCharacteristsics.

Returns:
the numerical characteristics
Throws:
Exception - an Exception is thrown if some of the characteristics could not be defined

toString

String toString()
Should give a simple representation (text) of the model as String.

Overrides:
toString in class Object
Returns:
the representation as String

setNewAlphabetContainerInstance

boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
This method tries to set a new instance of an AlphabetContainer for the current model. This instance has to be consistent with the underlying instance of an AlphabetContainer.

This method can be very usefull to save time.

Parameters:
abc - the alphabets
Returns:
true if the new instance could be set
See Also:
getAlphabetContainer(), AlphabetContainer.checkConsistency(AlphabetContainer)