de.jstacs.models.discrete.homogeneous
Class HomogeneousModel

java.lang.Object
  extended by de.jstacs.models.AbstractModel
      extended by de.jstacs.models.discrete.DiscreteGraphicalModel
          extended by de.jstacs.models.discrete.homogeneous.HomogeneousModel
All Implemented Interfaces:
InstantiableFromParameterSet, Model, Storable, Cloneable
Direct Known Subclasses:
HomogeneousMM

public abstract class HomogeneousModel
extends DiscreteGraphicalModel

This class implements homogeneous models of arbitrary order.

Author:
Jens Keilwagen
See Also:
HomogeneousModelParameterSet

Nested Class Summary
protected  class HomogeneousModel.HomCondProb
          This class handles the (conditional) probabilities of a homogeneous model in a fast way.
 
Field Summary
protected  byte order
          The order of the model.
protected  int[] powers
          The powers of the alphabet length.
 
Fields inherited from class de.jstacs.models.discrete.DiscreteGraphicalModel
params, trained
 
Fields inherited from class de.jstacs.models.AbstractModel
alphabets, length
 
Constructor Summary
HomogeneousModel(HomogeneousModelParameterSet params)
          Creates a homogeneous model from a parameter set.
HomogeneousModel(StringBuffer stringBuff)
          The standard constructor for the interface Storable.
 
Method Summary
protected  void check(Sequence sequence, int startpos, int endpos)
          Checks some constraints, these are in general conditions on the AlphabetContainer of a (sub)Sequence between startpos und endpos.
protected  int chooseFromDistr(Constraint distr, int start, int end, double randNo)
          Chooses a value in [0,end-start] according to the distribution encoded in the frequencies of distr between the indices start and end.
protected  HomogeneousModel.HomCondProb[] cloneHomProb(HomogeneousModel.HomCondProb[] p)
          Clones the given array of conditional probabilities.
 Sample emitSample(int no, int... length)
          Creates a Sample of a given number of Sequences from a trained homogeneous model.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by Model.getCharacteristics().
 double getProbFor(Sequence sequence, int startpos, int endpos)
          Returns the probability of (a part of) the given sequence given the model.
protected abstract  Sequence getRandomSequence(Random r, int length)
          This method creates a random Sequence from a trained homogeneous model.
protected abstract  double logProbFor(Sequence sequence, int startpos, int endpos)
          This method computes the logarithm of the probability of the given Sequence in the given interval.
protected abstract  double probFor(Sequence sequence, int startpos, int endpos)
          This method computes the probability of the given Sequence in the given interval.
protected  void set(DGMParameterSet params, boolean trained)
          Sets the parameters as internal parameters and does some essential computations.
 void train(Sample[] data)
          Trains the homogeneous model on all given Samples.
abstract  void train(Sample[] data, double[][] weights)
          Trains the homogeneous model using an array of weighted Samples.
 
Methods inherited from class de.jstacs.models.discrete.DiscreteGraphicalModel
clone, fromXML, getCurrentParameterSet, getDescription, getESS, getFurtherModelInfos, getXMLTag, isTrained, setFurtherModelInfos, toString, toXML
 
Methods inherited from class de.jstacs.models.AbstractModel
getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogProbFor, getLogProbFor, getPriorTerm, getProbFor, getProbFor, set, setNewAlphabetContainerInstance, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.jstacs.models.Model
getInstanceName, getLogPriorTerm, train
 

Field Detail

powers

protected int[] powers
The powers of the alphabet length.


order

protected byte order
The order of the model.

Constructor Detail

HomogeneousModel

public HomogeneousModel(HomogeneousModelParameterSet params)
                 throws CloneNotSupportedException,
                        IllegalArgumentException,
                        NonParsableException
Creates a homogeneous model from a parameter set.

Parameters:
params - the parameter set
Throws:
CloneNotSupportedException - if the parameter set could not be cloned
IllegalArgumentException - if the parameter set is not instantiated
NonParsableException - if the parameter set is not parsable
See Also:
HomogeneousModelParameterSet, DiscreteGraphicalModel.DiscreteGraphicalModel(DGMParameterSet)

HomogeneousModel

public HomogeneousModel(StringBuffer stringBuff)
                 throws NonParsableException
The standard constructor for the interface Storable. Creates a new HomogeneousModel out of its XML representation.

Parameters:
stringBuff - the XML representation as StringBuffer
Throws:
NonParsableException - if the HomogeneousModel could not be reconstructed out of the XML representation (the StringBuffer could not be parsed)
See Also:
Storable, DiscreteGraphicalModel.DiscreteGraphicalModel(StringBuffer)
Method Detail

emitSample

public final Sample emitSample(int no,
                               int... length)
                        throws NotTrainedException,
                               IllegalArgumentException,
                               EmptySampleException,
                               WrongAlphabetException,
                               WrongSequenceTypeException
Creates a Sample of a given number of Sequences from a trained homogeneous model.

Specified by:
emitSample in interface Model
Overrides:
emitSample in class AbstractModel
Parameters:
no - the number of Sequences that should be in the Sample
length - the length of all Sequences or an array of lengths with the Sequence with index i having length length[i]
Returns:
the created Sample
Throws:
NotTrainedException - if the model was not trained
IllegalArgumentException - if the dimension of length is neither 1 nor no
EmptySampleException - if no == 0
WrongSequenceTypeException - if the Sequence type is not suitable (for the AlphabetContainer)
WrongAlphabetException - if something is wrong with the alphabet
See Also:
Sample.Sample(String, Sequence...)

getRandomSequence

protected abstract Sequence getRandomSequence(Random r,
                                              int length)
                                       throws WrongAlphabetException,
                                              WrongSequenceTypeException
This method creates a random Sequence from a trained homogeneous model.

Parameters:
r - the random generator
length - the length of the Sequence
Returns:
the created Sequence
Throws:
WrongSequenceTypeException - if the Sequence type is not suitable (for the AlphabetContainer)
WrongAlphabetException - if something is wrong with the alphabet

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
Description copied from interface: Model
This method returns the maximal used Markov order, if possible.

Specified by:
getMaximalMarkovOrder in interface Model
Overrides:
getMaximalMarkovOrder in class AbstractModel
Returns:
maximal used Markov order

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from interface: Model
Returns the subset of numerical values that are also returned by Model.getCharacteristics().

Returns:
the numerical characteristics of the current instance of the model
Throws:
Exception - if some of the characteristics could not be defined

getLogProbFor

public final double getLogProbFor(Sequence sequence,
                                  int startpos,
                                  int endpos)
                           throws NotTrainedException,
                                  Exception
Description copied from interface: Model
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

For more details see Model.getProbFor(Sequence, int, int)

Specified by:
getLogProbFor in interface Model
Overrides:
getLogProbFor in class AbstractModel
Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model
See Also:
Model.getProbFor(Sequence, int, int)

getProbFor

public final double getProbFor(Sequence sequence,
                               int startpos,
                               int endpos)
                        throws NotTrainedException,
                               Exception
Description copied from interface: Model
Returns the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method Model.getProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the sequence could not be handled (e.g. startpos > endpos, endpos > sequence.length, ...) by the model

train

public void train(Sample[] data)
           throws Exception
Trains the homogeneous model on all given Samples.

Parameters:
data - the given Samples
Throws:
Exception - if something went wrong
See Also:
train(Sample[], double[][])

train

public abstract void train(Sample[] data,
                           double[][] weights)
                    throws Exception
Trains the homogeneous model using an array of weighted Samples. The Sequence weights in weights[i] are for the Sample in data[i].

Parameters:
data - the given Samples
weights - the weights
Throws:
Exception - if something went wrong, furthermore data.length has to be weights.length

set

protected void set(DGMParameterSet params,
                   boolean trained)
            throws CloneNotSupportedException,
                   NonParsableException
Description copied from class: DiscreteGraphicalModel
Sets the parameters as internal parameters and does some essential computations. Used in fromParameterSet-methods.

Overrides:
set in class DiscreteGraphicalModel
Parameters:
params - the new ParameterSet
trained - indicates if the model is trained or not
Throws:
CloneNotSupportedException - if the parameter set could not be cloned
NonParsableException - if the parameters of the model could not be parsed

check

protected void check(Sequence sequence,
                     int startpos,
                     int endpos)
              throws NotTrainedException,
                     IllegalArgumentException
Checks some constraints, these are in general conditions on the AlphabetContainer of a (sub)Sequence between startpos und endpos.

Overrides:
check in class DiscreteGraphicalModel
Parameters:
sequence - the Sequence
startpos - the start position within the Sequence
endpos - the end position within the Sequence
Throws:
NotTrainedException - if the model is not trained
IllegalArgumentException - if some arguments are wrong
See Also:
DiscreteGraphicalModel.check(Sequence, int, int)

chooseFromDistr

protected final int chooseFromDistr(Constraint distr,
                                    int start,
                                    int end,
                                    double randNo)
Chooses a value in [0,end-start] according to the distribution encoded in the frequencies of distr between the indices start and end.

The instance distr is not changed in the process.

Parameters:
distr - the distribution
start - the start index
end - the end index
randNo - a random number in [0,1]
Returns:
the chosen value
See Also:
Constraint.getFreq(int)

logProbFor

protected abstract double logProbFor(Sequence sequence,
                                     int startpos,
                                     int endpos)
This method computes the logarithm of the probability of the given Sequence in the given interval. The method is only used in Model.getLogProbFor(Sequence, int, int) after the method check(Sequence, int, int) has been invoked.

Parameters:
sequence - the Sequence
startpos - the start position within the Sequence
endpos - the end position within the Sequence
Returns:
the logarithm of the probability for the given subsequence
See Also:
check(Sequence, int, int), Model.getLogProbFor(Sequence, int, int)

probFor

protected abstract double probFor(Sequence sequence,
                                  int startpos,
                                  int endpos)
This method computes the probability of the given Sequence in the given interval. The method is only used in Model.getProbFor(Sequence, int, int) after the method check(Sequence, int, int) has been invoked.

Parameters:
sequence - the Sequence
startpos - the start position within the Sequence
endpos - the end position within the Sequence
Returns:
the probability for the given subsequence
See Also:
check(Sequence, int, int), Model.getProbFor(Sequence, int, int)

cloneHomProb

protected HomogeneousModel.HomCondProb[] cloneHomProb(HomogeneousModel.HomCondProb[] p)
Clones the given array of conditional probabilities.

Parameters:
p - the original conditional probabilities
Returns:
an array of clones