de.jstacs.sequenceScores.statisticalModels.trainable
Class VariableLengthWrapperTrainSM

java.lang.Object
  extended by de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
      extended by de.jstacs.sequenceScores.statisticalModels.trainable.VariableLengthWrapperTrainSM
All Implemented Interfaces:
SequenceScore, StatisticalModel, TrainableStatisticalModel, Storable, Cloneable

public class VariableLengthWrapperTrainSM
extends AbstractTrainableStatisticalModel

This class allows to train any TrainableStatisticalModel on DataSets of Sequences with variable length if each individual length is at least SequenceScore.getLength(). All other methods are piped to the internally used TrainableStatisticalModel.

This class might be useful in any ClassifierAssessment.

Author:
Jens Keilwagen
See Also:
DataSet.WeightedDataSetFactory.DataSet.WeightedDataSetFactory(DataSet.WeightedDataSetFactory.SortOperation, DataSet, double[], int), ClassifierAssessment

Field Summary
 
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
alphabets, length
 
Constructor Summary
VariableLengthWrapperTrainSM(StringBuffer stringBuff)
          The standard constructor for the interface Storable.
VariableLengthWrapperTrainSM(TrainableStatisticalModel m)
          This is the main constructor that creates an instance from any TrainableStatisticalModel.
 
Method Summary
 VariableLengthWrapperTrainSM clone()
          Follows the conventions of Object's clone()-method.
protected  void fromXML(StringBuffer xml)
          This method should only be used by the constructor that works on a StringBuffer.
 String getInstanceName()
          Should return a short instance name such as iMM(0), BN(2), ...
 double getLogPriorTerm()
          Returns a value that is proportional to the log of the prior.
 double getLogProbFor(Sequence sequence, int startpos, int endpos)
          Returns the logarithm of the probability of (a part of) the given sequence given the model.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().
 boolean isInitialized()
          This method can be used to determine whether the instance is initialized.
 String toString(NumberFormat nf)
          This method returns a String representation of the instance.
 StringBuffer toXML()
          This method returns an XML representation as StringBuffer of an instance of the implementing class.
 void train(DataSet data, double[] weights)
          Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights.
 
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
check, emitDataSet, getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, toString, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

VariableLengthWrapperTrainSM

public VariableLengthWrapperTrainSM(TrainableStatisticalModel m)
                             throws CloneNotSupportedException
This is the main constructor that creates an instance from any TrainableStatisticalModel.

Parameters:
m - the model
Throws:
CloneNotSupportedException - if the mode m could not be cloned

VariableLengthWrapperTrainSM

public VariableLengthWrapperTrainSM(StringBuffer stringBuff)
                             throws NonParsableException
The standard constructor for the interface Storable. Creates a new VariableLengthWrapperTrainSM out of a StringBuffer.

Parameters:
stringBuff - the StringBuffer to be parsed
Throws:
NonParsableException - is thrown if the StringBuffer could not be parsed
Method Detail

clone

public VariableLengthWrapperTrainSM clone()
                                   throws CloneNotSupportedException
Description copied from class: AbstractTrainableStatisticalModel
Follows the conventions of Object's clone()-method.

Specified by:
clone in interface SequenceScore
Specified by:
clone in interface TrainableStatisticalModel
Overrides:
clone in class AbstractTrainableStatisticalModel
Returns:
an object, that is a copy of the current AbstractTrainableStatisticalModel (the member-AlphabetContainer isn't deeply cloned since it is assumed to be immutable). The type of the returned object is defined by the class X directly inherited from AbstractTrainableStatisticalModel. Hence X's clone()-method should work as:
1. Object o = (X)super.clone();
2. all additional member variables of o defined by X that are not of simple data-types like int, double, ... have to be deeply copied
3. return o
Throws:
CloneNotSupportedException - if something went wrong while cloning

fromXML

protected void fromXML(StringBuffer xml)
                throws NonParsableException
Description copied from class: AbstractTrainableStatisticalModel
This method should only be used by the constructor that works on a StringBuffer. It is the counter part of Storable.toXML().

Specified by:
fromXML in class AbstractTrainableStatisticalModel
Parameters:
xml - the XML representation of the model
Throws:
NonParsableException - if the StringBuffer is not parsable or the representation is conflicting
See Also:
AbstractTrainableStatisticalModel.AbstractTrainableStatisticalModel(StringBuffer)

toXML

public StringBuffer toXML()
Description copied from interface: Storable
This method returns an XML representation as StringBuffer of an instance of the implementing class.

Returns:
the XML representation

getInstanceName

public String getInstanceName()
Description copied from interface: SequenceScore
Should return a short instance name such as iMM(0), BN(2), ...

Returns:
a short instance name

getLogPriorTerm

public double getLogPriorTerm()
                       throws Exception
Description copied from interface: StatisticalModel
Returns a value that is proportional to the log of the prior. For maximum likelihood (ML) 0 should be returned.

Returns:
a value that is proportional to the log of the prior
Throws:
Exception - if something went wrong

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from interface: SequenceScore
Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics().

Returns:
the numerical characteristics of the current instance
Throws:
Exception - if some of the characteristics could not be defined

getLogProbFor

public double getLogProbFor(Sequence sequence,
                            int startpos,
                            int endpos)
                     throws NotTrainedException,
                            Exception
Description copied from interface: StatisticalModel
Returns the logarithm of the probability of (a part of) the given sequence given the model. If at least one random variable is continuous the value of density function is returned.

It extends the possibility given by the method StatisticalModel.getLogProbFor(Sequence, int) by the fact, that the model could be e.g. homogeneous and therefore the length of the sequences, whose probability should be returned, is not fixed. Additionally, the end position of the part of the given sequence is given and the probability of the part from position startpos to endpos (inclusive) should be returned.
The length and the alphabets define the type of data that can be modeled and therefore both has to be checked.

Parameters:
sequence - the given sequence
startpos - the start position within the given sequence
endpos - the last position to be taken into account
Returns:
the logarithm of the probability or the value of the density function of (the part of) the given sequence given the model
Throws:
NotTrainedException - if the model is not trained yet
Exception - if the sequence could not be handled (e.g. startpos > , endpos > sequence.length, ...) by the model

isInitialized

public boolean isInitialized()
Description copied from interface: SequenceScore
This method can be used to determine whether the instance is initialized. If the instance is initialized you should be able to invoke SequenceScore.getLogScoreFor(Sequence).

Returns:
true if the instance is initialized, false otherwise

train

public void train(DataSet data,
                  double[] weights)
           throws Exception
Description copied from interface: TrainableStatisticalModel
Trains the TrainableStatisticalModel object given the data as DataSet using the specified weights. The weight at position i belongs to the element at position i. So the array weight should have the number of sequences in the data set as dimension. (Optionally it is possible to use weight == null if all weights have the value one.)
This method should work non-incrementally. That means the result of the following series: train(data1); train(data2) should be a fully trained model over data2 and not over data1+data2. All parameters of the model were given by the call of the constructor.

Parameters:
data - the given sequences as DataSet
weights - the weights of the elements, each weight should be non-negative
Throws:
Exception - if the training did not succeed (e.g. the dimension of weights and the number of sequences in the data set do not match)
See Also:
DataSet.getElementAt(int), DataSet.ElementEnumerator

toString

public String toString(NumberFormat nf)
Description copied from interface: SequenceScore
This method returns a String representation of the instance.

Parameters:
nf - the NumberFormat for the String representation of parameters or probabilities
Returns:
a String representation of the instance