Class SamplingScoreBasedClassifier

  extended by de.jstacs.classifier.AbstractClassifier
      extended by de.jstacs.classifier.AbstractScoreBasedClassifier
          extended by de.jstacs.classifier.scoringFunctionBased.sampling.SamplingScoreBasedClassifier
All Implemented Interfaces:
Storable, Cloneable
Direct Known Subclasses:

public abstract class SamplingScoreBasedClassifier
extends AbstractScoreBasedClassifier

A classifier that samples the parameters of SamplingScoringFunctions by the Metropolis-Hastings algorithm. The distribution the parameters are sampled from is the distribution $P(\vec{\lambda}^{t})$ represented by the SFBasedOptimizableFunction returned by getFunction(Sample[], double[][]). As proposal distribution, a Gaussian distribution with given sampling variance is used for each parameter. Specifically, a new set of parameters $\vec{\lambda}^{t}$ is drawn from a proposal distribution $Q(\vec{\lambda}^{t} | \vec{\lambda}^{t-1})$, where

\[ Q(\vec{\lambda}^{t}|\vec{\lambda}^{t-1}) = \prod_{i} \mathcal{N}(\lambda_i^{t}|\lambda_i^{t},\sigma_i^2)\]
and $\sigma_i^2$ is the sampling variance for parameter $\lambda_i$. The sampling variances are adapted to the size of the event space of each parameter based on a class-dependent variance provided to the constructor. This adaption depends on the correct implementation of NormalizableScoringFunction.getSizeOfEventSpaceForRandomVariablesOfParameter(int). Let $s_i$ be the size of the event space of the random variable of parameter $\lambda_i$, and let $\sigma^{2}$ be the class-dependent variance for the SamplingScoringFunction that $\lambda_i$ is a parameter of. Then $\sigma_i:=\sigma^{2}*s_i$. If $s_i=0$ then $\sigma_i:=\sigma^{2}$. After a new set of parameters $\vec{\lambda}^{t}$ has been drawn, the sampling process decides if this new set of parameters is accepted according to the distribution $P(\vec{\lambda}^{t})$ that we want to sample from. Specifically, the parameters are accepted, iff.
\[ \alpha < \frac{ P(\vec{\lambda}^{t})Q(\vec{\lambda}^{t-1} | \vec{\lambda}^{t}) }{P(\vec{\lambda}^{t-1}) Q(\vec{\lambda}^{t} | \vec{\lambda}^{t-1})},\]
where $\alpha$ is drawn from a uniform distribution in $[0,1]$, i.e. Random.nextDouble(). Otherwise, the parameters are rejected and $\vec{\lambda}^{t}:=\vec{\lambda}^{t-1}$. Since the Gaussian distribution is symmetric around its mean, $Q(\vec{\lambda}^{t-1} | \vec{\lambda}^{t})=Q(\vec{\lambda}^{t} | \vec{\lambda}^{t-1})$, both terms cancel, and the acceptance probability depends only on the current and previous values of $P$. All sampled parameters are stored to separate temporary files for each concurrent sampling run by an internal SamplingComponent. The contents of these files are stored together with the remaining representation of the SamplingScoreBasedClassifier, if AbstractClassifier.toXML() is called, and, hence, can be stored to a monolithic file containing all information for, e.g., later classification procedures. For determining the length of the burn-in phase and, as a consequence, the beginning of the stationary phase, a BurnInTest can be provided to the constructor of the classifier.

Jan Grau

Nested Class Summary
static class SamplingScoreBasedClassifier.SamplingScheme
          Sampling scheme for sampling the parameters of the scoring functions.
protected  class SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent
          The SamplingComponent that handles storing and loading sampled parameters values to and from files.
Nested classes/interfaces inherited from class de.jstacs.classifier.AbstractScoreBasedClassifier
Field Summary
protected  BurnInTest burnInTest
          The BurnInTest, may be null for no test
protected  double[] currentParameters
          the currently accepted parameters
protected  double currentScore
          The score achieved using currentParameters
protected  double[] initParameters
          The initial parameters if set by setInitParameters(double[]), null otherwise
protected  double[][] lastParameters
          The last accepted parameters for all samplings, backup for iterative sampling when checking for BurnInTest
protected  double[] lastScore
          The scores yielded for the parameters in lastParameters
protected  SamplingScoreBasedClassifierParameterSet params
protected  double[] previousParameters
          The previously accepted parameters, backup for rollbacks
protected  SamplingScoringFunction[] scoringFunctions
Constructor Summary
protected SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params, BurnInTest burnInTest, double[] classVariances, SamplingScoringFunction... scoringFunctions)
          Creates a new SamplingScoreBasedClassifier using the parameters in params, a specified BurnInTest (or null for no burn-in test), a set of sampling variances, which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution), and set set of SamplingScoringFunctions for each of the classes.
  SamplingScoreBasedClassifier(StringBuffer xml)
          This is the constructor for Storable.
Method Summary
protected  double doOneSamplingStep(SFBasedOptimizableFunction function, SamplingScoreBasedClassifier.SamplingScheme scheme, double previousValue)
          Performs one sampling step, i.e., one sampling of all parameter values.
 void doSingleSampling(Sample[] s, double[][] weights, int numSteps, String outfilePrefix)
          Does a single sampling run for a predefined number of steps.
protected  void extractFurtherClassifierInfosFromXML(StringBuffer xml)
          Extracts further information of a classifier from an XML representation.
protected  double[] getBestParameters()
          Returns the sampled parameter values with the maximum value of the objective function
 CategoricalResult[] getClassifierAnnotation()
          Returns an array of Results of dimension AbstractClassifier.getNumberOfClasses() that contains information about the classifier and for each class.
 boolean getDeleteOnExit()
          Returns true if the temporary parameter files shall be deleted on exit of the program.
protected abstract  SFBasedOptimizableFunction getFunction(Sample[] data, double[][] weights)
          Returns the function that should be sampled from.
protected  StringBuffer getFurtherClassifierInfos()
          This method returns further information of a classifier as a StringBuffer.
 String getInstanceName()
          Returns a short description of the classifier.
protected  double[] getMeanParameters(boolean testBurnIn, int minBurnInSteps)
          Returns the mean parameters over all samplings of all stationary phases.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by AbstractClassifier.getCharacteristics().
protected  SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent getSamplingComponent()
          Returns a sampling component suited for this SamplingScoreBasedClassifier
protected  double getScore(Sequence seq, int cls, boolean check)
          This method returns the score for a given Sequence and a given class.
 double[] getScores(Sample s)
          This method returns the scores of the classifier for any Sequence in the Sample.
 File getTempDir()
          Returns the directory for parameter files set in this SamplingScoreBasedClassifier.
protected  void init(int starts, boolean adaptVariance, String outfilePrefix)
          Initializes all internal fields and initializes the scoringFunctionss randomly
 boolean isTrained()
          This method gives information about the state of the classifier.
 void joinAndSetParameterFiles(boolean add, File... files)
          Combines parameter files such that they are accepted as parameter files of this SamplingScoreBasedClassifier
protected  double modifyFunctionValue(double value)
          Allows for a modification of the value returned by the function obtained by getFunction(Sample[], double[][]).
protected  void precomputeBurnInLength(SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent sfsc)
          Precomputes the length of the burn-in phase, e.g. useful for computing scores of multiple sequences
protected  void sample(SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent sfsc, SFBasedOptimizableFunction function)
          Samples as many steps as needed to get into the stationary phase according to burnInTest and then samples the number of stationary steps as set in params.
protected  double sampleNSteps(SFBasedOptimizableFunction function, SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent component, BurnInTest test, int numSteps, SamplingScoreBasedClassifier.SamplingScheme scheme)
          Samples a predefined number of steps appended to the current sampling
 void setDeleteOnExit(boolean deleteOnExit)
          If set to true (which is the default), the temporary files for storing sampled parameter values are deleted on exit of the program.
 void setInitParameters(double[] parameters)
          Sets the initial parameters of the sampling to parameters.
 void setTempDir(File tempDir)
          Sets the directory for parameter files set in this SamplingScoreBasedClassifier.
 void train(Sample[] s, double[][] weights)
          This method trains a classifier over an array of weighted Sample s.
Methods inherited from class de.jstacs.classifier.AbstractScoreBasedClassifier
check, check, classify, classify, clone, createDefaultClassWeights, getClassWeight, getClassWeights, getNumberOfClasses, getPValue, getPValue, getResults, getScore, setClassWeights, setClassWeights, setThresholdClassWeights, test
Methods inherited from class de.jstacs.classifier.AbstractClassifier
classify, evaluate, evaluateAll, getAlphabetContainer, getCharacteristics, getClassificationRate, getLength, getMeasuresForEvaluate, getMeasuresForEvaluateAll, getXMLTag, setNewAlphabetContainerInstance, toXML, train
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


protected SamplingScoreBasedClassifierParameterSet params


protected SamplingScoringFunction[] scoringFunctions


protected double[] currentParameters
the currently accepted parameters


protected double[] initParameters
The initial parameters if set by setInitParameters(double[]), null otherwise


protected double currentScore
The score achieved using currentParameters


protected double[] previousParameters
The previously accepted parameters, backup for rollbacks


protected double[][] lastParameters
The last accepted parameters for all samplings, backup for iterative sampling when checking for BurnInTest


protected double[] lastScore
The scores yielded for the parameters in lastParameters


protected BurnInTest burnInTest
The BurnInTest, may be null for no test

Constructor Detail


public SamplingScoreBasedClassifier(StringBuffer xml)
                             throws NonParsableException
This is the constructor for Storable.

xml - the xml representation
NonParsableException - if the representation could not be parsed.


protected SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params,
                                       BurnInTest burnInTest,
                                       double[] classVariances,
                                       SamplingScoringFunction... scoringFunctions)
                                throws CloneNotSupportedException
Creates a new SamplingScoreBasedClassifier using the parameters in params, a specified BurnInTest (or null for no burn-in test), a set of sampling variances, which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution), and set set of SamplingScoringFunctions for each of the classes.

params - the external parameters of this classifier
burnInTest - the burn-in test (or null for no burn-in test)
classVariances - the variances used for sampling for the parameters of each class
scoringFunctions - the scoring functions for each of the classes
CloneNotSupportedException - if the scoring functions or the burn-in test could not be cloned
See Also:
Method Detail


protected StringBuffer getFurtherClassifierInfos()
Description copied from class: AbstractClassifier
This method returns further information of a classifier as a StringBuffer. This method is used by the method AbstractClassifier.toXML() and should not be made public.

getFurtherClassifierInfos in class AbstractScoreBasedClassifier
further information of a classifier as a StringBuffer
See Also:


protected void extractFurtherClassifierInfosFromXML(StringBuffer xml)
                                             throws NonParsableException
Description copied from class: AbstractClassifier
Extracts further information of a classifier from an XML representation. This method is used by the method AbstractClassifier.fromXML(StringBuffer) and should not be made public.

extractFurtherClassifierInfosFromXML in class AbstractScoreBasedClassifier
xml - the XML representation as StringBuffer
NonParsableException - if the information could not be parsed out of the XML representation (the StringBuffer could not be parsed)
See Also:


public CategoricalResult[] getClassifierAnnotation()
Description copied from class: AbstractClassifier
Returns an array of Results of dimension AbstractClassifier.getNumberOfClasses() that contains information about the classifier and for each class.

res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );

Specified by:
getClassifierAnnotation in class AbstractClassifier
an array of Results that contains information about the classifier


public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from class: AbstractClassifier
Returns the subset of numerical values that are also returned by AbstractClassifier.getCharacteristics().

Specified by:
getNumericalCharacteristics in class AbstractClassifier
the numerical characteristics
Exception - if some of the characteristics could not be defined


public String getInstanceName()
Description copied from class: AbstractClassifier
Returns a short description of the classifier.

Specified by:
getInstanceName in class AbstractClassifier
a short description of the classifier


protected abstract SFBasedOptimizableFunction getFunction(Sample[] data,
                                                          double[][] weights)
                                                   throws Exception
Returns the function that should be sampled from.

data - the samples
weights - the weights of the sequences of the samples
the function that should be sampled from
Exception - if the function could not be created


protected double modifyFunctionValue(double value)
Allows for a modification of the value returned by the function obtained by getFunction(Sample[], double[][]). This is for instance necessary in case of LogGenDisMixFunction to obtain a proper posterior or supervised posterior.

value - the original value
the modified value


protected SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent getSamplingComponent()
Returns a sampling component suited for this SamplingScoreBasedClassifier

the sampling component


public File getTempDir()
Returns the directory for parameter files set in this SamplingScoreBasedClassifier. If this value is null, the default directory of the executing OS is used for the parameter files.

the temp directory


public void setTempDir(File tempDir)
Sets the directory for parameter files set in this SamplingScoreBasedClassifier. If tempDir is null, the default directory of the executing OS is used for the parameter files. If this value is reset after training, all sampled parameters will be lost. The value set by this method is not stored in the XML-representation.

tempDir - the temp directory


public boolean getDeleteOnExit()
Returns true if the temporary parameter files shall be deleted on exit of the program.

if temp files are deleted


public void setDeleteOnExit(boolean deleteOnExit)
                     throws Exception
If set to true (which is the default), the temporary files for storing sampled parameter values are deleted on exit of the program. If this value is set to true it cannot be reset to false, again, after sampling started due to the restrictions of File.deleteOnExit(). If you want to retain those parameters, nonetheless, you can call AbstractClassifier.toXML() and save this StringBuffer, which also contains the sampled parameter values, somewhere. The value set by this method is not stored in the XML-representation.

deleteOnExit - if temp files shall be deleted on exit
Exception - if set to false after sampling started


protected void init(int starts,
                    boolean adaptVariance,
                    String outfilePrefix)
             throws Exception
Initializes all internal fields and initializes the scoringFunctionss randomly

starts - number of starts
adaptVariance - if true, variance is adapted to size of event space
outfilePrefix - the prefix of the outfiles
Exception - if the scoring functions could not be initialized


protected double sampleNSteps(SFBasedOptimizableFunction function,
                              SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent component,
                              BurnInTest test,
                              int numSteps,
                              SamplingScoreBasedClassifier.SamplingScheme scheme)
                       throws Exception
Samples a predefined number of steps appended to the current sampling

function - the objective function
component - the sampling component with selected sampling
test - the burn-in test
numSteps - the number of steps
scheme - the SamplingScoreBasedClassifier.SamplingScheme
the value of the last accepted parameters
Exception - if either the function could not be evaluated on the current parameters or the sampled parameters could not be stored


protected void sample(SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent sfsc,
                      SFBasedOptimizableFunction function)
               throws Exception
Samples as many steps as needed to get into the stationary phase according to burnInTest and then samples the number of stationary steps as set in params.

sfsc - the current sampling component
function - the objective function
Exception - if the sampling could not be extended, e.g. due to evaluation errors


protected double doOneSamplingStep(SFBasedOptimizableFunction function,
                                   SamplingScoreBasedClassifier.SamplingScheme scheme,
                                   double previousValue)
                            throws Exception
Performs one sampling step, i.e., one sampling of all parameter values.

function - the objective function
scheme - the SamplingScoreBasedClassifier.SamplingScheme
previousValue - the value of the last sampling or minus infinity for the first sampling run
the value of the last accepted parameter values or Double.NaN if none of the sampled parameters where accepted
Exception - if the function could not be evaluated or an unknown SamplingScoreBasedClassifier.SamplingScheme was provided


protected double getScore(Sequence seq,
                          int cls,
                          boolean check)
                   throws IllegalArgumentException,
Description copied from class: AbstractScoreBasedClassifier
This method returns the score for a given Sequence and a given class.

Specified by:
getScore in class AbstractScoreBasedClassifier
seq - the Sequence
cls - the index of the class
check - the switch to decide whether to check AlphabetContainer and the length of the Sequence or not
the score for a given Sequence and a given class
IllegalArgumentException - if something is wrong with the Sequence seq
NotTrainedException - if the classifier is not trained
Exception - if something went wrong


public double[] getScores(Sample s)
                   throws Exception
Description copied from class: AbstractScoreBasedClassifier
This method returns the scores of the classifier for any Sequence in the Sample. The scores are stored in the array according to the index of the Sequence in the Sample.

Only for 2-class-classifiers.

getScores in class AbstractScoreBasedClassifier
s - the Sample
the array of scores
Exception - if something went wrong


public void setInitParameters(double[] parameters)
Sets the initial parameters of the sampling to parameters.

parameters - the initial parameters


public boolean isTrained()
Description copied from class: AbstractClassifier
This method gives information about the state of the classifier.

Specified by:
isTrained in class AbstractClassifier
true if the classifier is trained and therefore able to classify sequences, otherwise false


public void doSingleSampling(Sample[] s,
                             double[][] weights,
                             int numSteps,
                             String outfilePrefix)
                      throws Exception
Does a single sampling run for a predefined number of steps.

s - the data
weights - the weights for the data
numSteps - the number of sampling steps
outfilePrefix - the prefix of the outfile where the parameter values are stored
Exception - if the scoring functions could not be initialized or the sampling could not be extended, e.g. due to evaluation errors


public void train(Sample[] s,
                  double[][] weights)
           throws Exception
Description copied from class: AbstractClassifier
This method trains a classifier over an array of weighted Sample s. That is why the following has to be fulfilled: This method should work non-incrementally as the method AbstractClassifier.train(Sample...).

This method should check that the Samples are defined over the underlying alphabet and length.

Specified by:
train in class AbstractClassifier
s - an array of Samples
weights - the weights for the Samples
Exception - if the weights are incorrect or the training did not succeed
See Also:


protected void precomputeBurnInLength(SamplingScoreBasedClassifier.ScoringFunctionSamplingComponent sfsc)
                               throws Exception
Precomputes the length of the burn-in phase, e.g. useful for computing scores of multiple sequences

sfsc - the current sampling component
Exception - if the parameters values could not be parsed


protected double[] getBestParameters()
                              throws Exception
Returns the sampled parameter values with the maximum value of the objective function

the best parameters
Exception - if the parameters values could not be parsed


protected double[] getMeanParameters(boolean testBurnIn,
                                     int minBurnInSteps)
                              throws Exception
Returns the mean parameters over all samplings of all stationary phases.

testBurnIn - true if the length of the burn-in phase shall be computed
minBurnInSteps - minimum number of steps considered as burn-in
the mean parameters
Exception - if the parameters values could not be parsed


public void joinAndSetParameterFiles(boolean add,
                                     File... files)
                              throws Exception
Combines parameter files such that they are accepted as parameter files of this SamplingScoreBasedClassifier

add - if true, parameter files are appended to the current ones, i.e., the number of samplings is augmented by these files
files - the parameter files
Exception - if the parameter files could not be joined