de.jstacs.models
Class AbstractModel

java.lang.Object
  extended by de.jstacs.models.AbstractModel
All Implemented Interfaces:
Model, Storable, Cloneable
Direct Known Subclasses:
AbstractMixtureModel, CompositeModel, DiscreteGraphicalModel, UniformModel

public abstract class AbstractModel
extends Object
implements Cloneable, Storable, Model

Abstract class for a model for pattern recognition.
For writing or reading a StringBuffer to or from a file (fromXML(StringBuffer), toXML()) you can use the class FileManager.

Author:
Andre Gohr, Jan Grau, Jens Keilwagen
See Also:
FileManager

Field Summary
protected  AlphabetContainer alphabets
          The underlying alphabets
protected  int length
          the length of the sequences the model can classify.
 
Constructor Summary
AbstractModel(AlphabetContainer alphabets, int length)
          Constructor that sets the length of the model to length and the AlphabetContainer to alphabets.
AbstractModel(StringBuffer stringBuff)
          Constructor for parsing a AbstractModel out of a StringBuffer.
 
Method Summary
 AbstractModel clone()
          Follows the conventions of Object's clone-method.
 Sample emitSample(int numberOfSequences, int... seqLength)
          This method returns a Sample object containing artificial sequence(s).
protected abstract  void fromXML(StringBuffer xml)
          This method should only be used by the constructor that works on StringBuffer.
 AlphabetContainer getAlphabetContainer()
          Returns the container of alphabets that were used when constructing the model.
 ResultSet getCharacteristics()
          Returns some information characterizing or describing the current instance of the model.
 int getLength()
          Returns the length of sequence this model can classify.
 double[] getLogProbFor(Sample data)
          This method computes the logarithm of the probabilities of all sequences in the given sample.
 void getLogProbFor(Sample data, double[] res)
          This method computes and stores the logarithm of the probabilities for any sequence in the sample in the given double array.
 double getLogProbFor(Sequence sequence)
          Returns the logarithm of the probability of the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos)
          Returns the logarithm of the probability of the given sequence given the model.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used markov order if possible.
 double getPriorTerm()
          Returns a value that is proportional to the prior.
 double getProbFor(Sequence sequence)
          Returns the probability of the given sequence given the model.
 double getProbFor(Sequence sequence, int startpos)
          Returns the probability of the given sequence given the model.
protected  void set(AlphabetContainer abc)
          This method should only be invoked by the method setNewAlphabetContainerInstance( AlphabetContainer ) and not be made public.
 boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
          This method tries to set a new instance of an AlphabetContainer for the current model.
 void train(Sample data)
          Trains the AbstractModel object given the data as Sample.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.jstacs.models.Model
getInstanceName, getLogPriorTerm, getNumericalCharacteristics, getProbFor, isTrained, toString, train
 

Field Detail

length

protected int length
the length of the sequences the model can classify. For models that can take sequences of arbitrary length this should be set to 0


alphabets

protected AlphabetContainer alphabets
The underlying alphabets

Constructor Detail

AbstractModel

public AbstractModel(AlphabetContainer alphabets,
                     int length)
Constructor that sets the length of the model to length and the AlphabetContainer to alphabets.
The parameter length gives the length of the sequences the model can classify. Models that can only classify sequences of defined length are e.g. PWM or inhomogeneous Markov models. If the model can classify sequences of arbitrary length, e.g. homogeneous Markov models, this parameter must be set to 0 (zero).
The length and alphabets define the type of data that can be modeled and therefor both has to be checked before any evaluation (e.g. getProbFor)

Parameters:
alphabets - the alphabets
length - the length of the sequences a model can classify, 0 for arbitrary length

AbstractModel

public AbstractModel(StringBuffer stringBuff)
              throws NonParsableException
Constructor for parsing a AbstractModel out of a StringBuffer.

Parameters:
stringBuff - the StringBuffer to be parsed
Throws:
NonParsableException - is thrown if the StringBuffer could not be parsed
Method Detail

clone

public AbstractModel clone()
                    throws CloneNotSupportedException
Follows the conventions of Object's clone-method.

Specified by:
clone in interface Model
Overrides:
clone in class Object
Returns:
an object, that is a copy of the current AbstractModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractModel. Hence X's clone-method should work as:
1. Object o = (X)super.clone(); 2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... , have to be deeply copied 3. return o
Throws:
CloneNotSupportedException

train

public void train(Sample data)
           throws Exception
Description copied from interface: Model
Trains the AbstractModel object given the data as Sample.
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Specified by:
train in interface Model
Parameters:
data - the given sequences as Sample
Throws:
Exception - an Exception should be thrown if the training did not succeed.
See Also:
Sample.getElementAt(int), Sample.ElementEnumerator

getProbFor

public double getProbFor(Sequence sequence)
                  throws NotTrainedException,
                         Exception
Description copied from interface: Model
Returns the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.
The length and alphabets define the type of data that can be modeled and therefore both has to be checked.

Specified by:
getProbFor in interface Model
Parameters:
sequence - the sequence
Returns:
the probability or the value of the density function of the given sequence given the model
Throws:
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
Exception - an Exception should be thrown if the sequence could not be handled by the model

getProbFor

public double getProbFor(Sequence sequence,
                         int startpos)
                  throws NotTrainedException,
                         Exception
Description copied from interface: Model
Returns the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

If the length of the sequences, whose probability should be returned, is fixed (e.g. in a inhomogenous model) and the given sequence is longer than their fixed length, the start position within the given sequence is given by startpos. E.g. the fixed length is 12. The length of the given sequence is 30 and the startpos=15 the probability of the part from position 15 to 26 (inclusive) given the model should be returned.
The length and alphabets define the type of data that can be modeled and therefore both has to be checked.

Specified by:
getProbFor in interface Model
Parameters:
sequence - the sequence
startpos - the start position
Returns:
the probability or the value of the density function of the part of the given sequence given the model
Throws:
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
Exception - an Exception should be thrown if the sequence could not be handled by the model

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos,
                            int endpos)
                     throws Exception
Description copied from interface: Model
Returns the logarithm of the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

For more details see Model.getProbFor(Sequence, int, int)

Specified by:
getLogProbFor in interface Model
Parameters:
sequence - the sequence
startpos - the start position
endpos - the last position to be taken into account
Returns:
the logarithm of probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled (e.g. startpos > endpos, endpos > sequence.length, ...) by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
See Also:
Model.getProbFor(Sequence, int, int)

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos)
                     throws Exception
Description copied from interface: Model
Returns the logarithm of the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

For more details see Model.getProbFor(Sequence, int)

Specified by:
getLogProbFor in interface Model
Parameters:
sequence - the sequence
startpos - the start position
Returns:
the logarithm of probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
See Also:
Model.getProbFor(Sequence, int)

getLogProbFor

public double getLogProbFor(Sequence sequence)
                     throws Exception
Description copied from interface: Model
Returns the logarithm of the probability of the given sequence given the model. If a least one random variable is continuous the value of density function is returned.

For more details see Model.getProbFor(Sequence)

Specified by:
getLogProbFor in interface Model
Parameters:
sequence - the sequence
Returns:
the logarithm of probability or the value of the density function of the part of the given sequence given the model
Throws:
Exception - an Exception should be thrown if the sequence could not be handled by the model
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
See Also:
Model.getProbFor(Sequence)

getLogProbFor

public double[] getLogProbFor(Sample data)
                       throws Exception
Description copied from interface: Model
This method computes the logarithm of the probabilities of all sequences in the given sample. The values are stored in the array according to the index of the sequence in the sample.

The probability for any sequence shall be computed independent of all other sequences in the sample. So the result should be exactly the same as for the method getLogProbFor(Sequence)

Specified by:
getLogProbFor in interface Model
Parameters:
data - the sample
Returns:
an array containing the logarithm of the probabilities of all sequences of the sample
Throws:
Exception - if something went wrong
See Also:
Model.getLogProbFor(Sequence)

getLogProbFor

public void getLogProbFor(Sample data,
                          double[] res)
                   throws Exception
Description copied from interface: Model
This method computes and stores the logarithm of the probabilities for any sequence in the sample in the given double array.

The probability for any sequence shall be computed independent of all other sequences in the sample. So the result should be exactly the same as for the method getLogProbFor(Sequence)

Specified by:
getLogProbFor in interface Model
Parameters:
data - the sample
res - the array for the results, has to have length data.getNumberOfElements()
Throws:
Exception - if something went wrong
See Also:
Model.getLogProbFor(Sample)

getPriorTerm

public double getPriorTerm()
                    throws Exception
Description copied from interface: Model
Returns a value that is proportional to the prior. For ML 1 should be returned.

Specified by:
getPriorTerm in interface Model
Returns:
a value that is proportional to the prior
Throws:
Exception - if something went wrong

emitSample

public Sample emitSample(int numberOfSequences,
                         int... seqLength)
                  throws NotTrainedException,
                         Exception
Description copied from interface: Model
This method returns a Sample object containing artificial sequence(s).

There are 2 different possibilities to create a sample for a model with length 0.
  1. emitSample( int n, int l ) should return a sample with n sequences of length l.
  2. emitSample( int n, int[] l ) should return a sample with n sequences which have a sequence length corresponding to the entry in the array


There are 2 different possibilities to create a sample for a model with length greater than 0. emitSample( int n ) and emitSample( int n, null ) should return a sample with n sequences of length of the model (Model.getLength())

The standard implementation throws an Exception.

Specified by:
emitSample in interface Model
Parameters:
numberOfSequences - the number of sequences that should be contained in the returned sample
seqLength - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
Sample containing the artificial sequence(s)
Throws:
NotTrainedException - a NotTrainedException should be thrown if the model is not trained yet.
Exception - an Exception should be thrown if the emission did not succeed.
See Also:
Sample

getAlphabetContainer

public final AlphabetContainer getAlphabetContainer()
Description copied from interface: Model
Returns the container of alphabets that were used when constructing the model.

Specified by:
getAlphabetContainer in interface Model
Returns:
the alphabet

getLength

public final int getLength()
Description copied from interface: Model
Returns the length of sequence this model can classify. Models that can only classify sequences of defined length are e.g. PWM or inhomogeneous Markov models. If the model can classify sequences of arbitrary length, e.g. homogeneous Markov models, this method returns 0 (zero).

Specified by:
getLength in interface Model
Returns:
the length

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
Description copied from interface: Model
This method returns the maximal used markov order if possible.

Specified by:
getMaximalMarkovOrder in interface Model
Returns:
maximal used markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer

getCharacteristics

public ResultSet getCharacteristics()
                             throws Exception
Description copied from interface: Model
Returns some information characterizing or describing the current instance of the model. This could be e.g. the number of edges for a Bayesian network or an image showing some representation of the model. The set of characteristics should always include the XML-representation of the model. The corresponding result type is ObjectResult

Specified by:
getCharacteristics in interface Model
Returns:
the characteristics
Throws:
Exception - an Exception is thrown if some of the characteristics could not be defined
See Also:
StorableResult

fromXML

protected abstract void fromXML(StringBuffer xml)
                         throws NonParsableException
This method should only be used by the constructor that works on StringBuffer. It is the counter part of toXML().

Parameters:
xml - the representation
Throws:
NonParsableException - if the StringBuffer is not parsable or the representation is conflicting
See Also:
AbstractModel(StringBuffer)

setNewAlphabetContainerInstance

public final boolean setNewAlphabetContainerInstance(AlphabetContainer abc)
Description copied from interface: Model
This method tries to set a new instance of an AlphabetContainer for the current model. This instance has to be consistent with the underlying instance of an AlphabetContainer.

This method can be very usefull to save time.

Specified by:
setNewAlphabetContainerInstance in interface Model
Parameters:
abc - the alphabets
Returns:
true if the new instance could be set
See Also:
Model.getAlphabetContainer(), AlphabetContainer.checkConsistency(AlphabetContainer)

set

protected void set(AlphabetContainer abc)
This method should only be invoked by the method setNewAlphabetContainerInstance( AlphabetContainer ) and not be made public.

It enables you to do more with the method setNewAlphabetContainerInstance( AlphabetContainer ), e.g. setting a new AlphabetContainer instance for subcomponents.

Parameters:
abc - the new instance