de.jstacs.sequenceScores.statisticalModels.trainable
Class UniformTrainSM

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.UniformTrainSM
All Implemented Interfaces:
SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable

public class UniformTrainSM
extends AbstractTrainableStatisticalModel

This class represents a uniform model. Sometimes it's also called uninformed model. It can be used if nothing is known about a statistical process.

Author:
Jens Keilwagen

Field Summary
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
UniformTrainSM(AlphabetContainer alphabet)
          Creates a new UniformTrainSM using a given AlphabetContainer.
UniformTrainSM(StringBuffer stringBuff)
          The standard constructor for the interface Storable.
 
Method Summary
 UniformTrainSM clone()
          Follows the conventions of Object's clone()-method.
 DataSet emitDataSet(int n, int... lengths)
          This method returns a DataSet object containing artificial sequence(s).
 void fromXML(StringBuffer representation)
          This method should only be used by the constructor that works on a StringBuffer.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 byte getMaximalMarkovOrder()
          This method returns the maximal used Markov order, if possible.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().
 boolean isInitialized()
          Returns true if the model is trained, false otherwise.
 String toString(NumberFormat nf)
          Returns the String "".
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 void train(DataSet data, double[] weights)
          Deprecated. 
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
check, getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toString, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

UniformTrainSM

public UniformTrainSM(AlphabetContainer alphabet)
Creates a new UniformTrainSM using a given AlphabetContainer.

Parameters:
alphabet - the alphabets used in the model

UniformTrainSM

public UniformTrainSM(StringBuffer stringBuff)
               throws NonParsableException
The standard constructor for the interface Storable. Creates a new UniformTrainSM out of a StringBuffer.

Parameters:
stringBuff - the StringBuffer to be parsed
Throws:
NonParsableException - if the StringBuffer is not parsable
Method Detail

clone

public UniformTrainSM clone()
                     throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class AbstractTrainableStatisticalModel
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos,
                            int endpos)
                     throws IllegalArgumentException,
                            WrongAlphabetException
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method StatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally, the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
IllegalArgumentException
WrongAlphabetException

isInitialized

public boolean isInitialized()
Returns true if the model is trained, false otherwise.

Returns:
true if the model is trained, false otherwise

fromXML

public void fromXML(StringBuffer representation)
             throws NonParsableException
Description copied from class: AbstractTrainableStatisticalModel
This method should only be used by the constructor that works on a StringBuffer. It is the counter part of Storable.toXML().

Specified by:
fromXML in class AbstractTrainableStatisticalModel
Parameters:
representation - the XML representation of the model
Throws:
NonParsableException - if the StringBuffer is not parsable or the representation is conflicting
See Also:
AbstractTrainableStatisticalModel.AbstractTrainableStatisticalModel(StringBuffer)

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Returns:
the XML representation

toString

public String toString(NumberFormat nf)
Returns the String "".

Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance

train

@Deprecated
public void train(DataSet data,
                             double[] weights)
           throws IOException
Deprecated. 

Description copied from interface: TrainableStatisticalModel
Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the data set as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences as DataSet
weights - the weights of the elements, each weight should be non-negative
Throws:
IOException
See Also:
DataSet.getElementAt(int), DataSet.ElementEnumerator

emitDataSet

public DataSet emitDataSet(int n,
                           int... lengths)
                    throws Exception
Description copied from interface: StatisticalModel
This method returns a DataSet object containing artificial sequence(s).

There are two different possibilities to create a data set for a model with length 0 (homogeneous models).
  1. emitDataSet( int n, int l ) should return a data set with n sequences of length l.
  2. emitDataSet( int n, int[] l ) should return a data set with n sequences which have a sequence length corresponding to the entry in the given array l.

There are two different possibilities to create a data set for a model with length greater than 0 (inhomogeneous models).
emitDataSet( int n ) and emitDataSet( int n, null ) should return a data set with n sequences of length of the model ( SequenceScore.getLength()).

The standard implementation throws an Exception.

Specified by:
emitDataSet in interface StatisticalModel
Overrides:
emitDataSet in class AbstractTrainableStatisticalModel
Parameters:
n - the number of sequences that should be contained in the returned data set
lengths - the length of the sequences for a homogeneous model; for an inhomogeneous model this parameter should be null or an array of size 0.
Returns:
a DataSet containing the artificial sequence(s)
Throws:
Exception - if the emission did not succeed
NotTrainedException - if the model is not trained yet
See Also:
DataSet

getLogPriorTerm

public double getLogPriorTerm()
                       throws Exception
Description copied from interface: StatisticalModel
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong

getMaximalMarkovOrder

public byte getMaximalMarkovOrder()
                           throws UnsupportedOperationException
Description copied from interface: StatisticalModel
This method returns the maximal used Markov order, if possible.

Specified by:
getMaximalMarkovOrder in interface StatisticalModel
Overrides:
getMaximalMarkovOrder in class AbstractTrainableStatisticalModel
Returns:
maximal used Markov order
Throws:
UnsupportedOperationException - if the model can't give a proper answer

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from interface: SequenceScore
Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().

Returns:
the numerical characteristics of the current instance
Throws:
Exception - if some of the characteristics could not be defined

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Returns:
a short instance name