de.jstacs.classifiers.differentiableSequenceScoreBased.sampling
Class SamplingScoreBasedClassifier

java.lang.Object
  extended by de.jstacs.classifiers.AbstractClassifier
      extended by de.jstacs.classifiers.AbstractScoreBasedClassifier
          extended by de.jstacs.classifiers.differentiableSequenceScoreBased.sampling.SamplingScoreBasedClassifier
All Implemented Interfaces:
Storable, Cloneable
Direct Known Subclasses:
SamplingGenDisMixClassifier

public abstract class SamplingScoreBasedClassifier
extends AbstractScoreBasedClassifier

A classifier that samples the parameters of SamplingDifferentiableStatisticalModels by the Metropolis-Hastings algorithm. The distribution the parameters are sampled from is the distribution $P(\vec{\lambda}^{t})$ represented by the DiffSSBasedOptimizableFunction returned by getFunction(DataSet[], double[][]). As proposal distribution, a Gaussian distribution with given sampling variance is used for each parameter. Specifically, a new set of parameters $\vec{\lambda}^{t}$ is drawn from a proposal distribution $Q(\vec{\lambda}^{t} | \vec{\lambda}^{t-1})$, where

\[ Q(\vec{\lambda}^{t}|\vec{\lambda}^{t-1}) = \prod_{i} \mathcal{N}(\lambda_i^{t}|\lambda_i^{t},\sigma_i^2)\]
and $\sigma_i^2$ is the sampling variance for parameter $\lambda_i$. The sampling variances are adapted to the size of the event space of each parameter based on a class-dependent variance provided to the constructor. This adaption depends on the correct implementation of DifferentiableStatisticalModel.getSizeOfEventSpaceForRandomVariablesOfParameter(int). Let $s_i$ be the size of the event space of the random variable of parameter $\lambda_i$, and let $\sigma^{2}$ be the class-dependent variance for the SamplingDifferentiableStatisticalModel that $\lambda_i$ is a parameter of. Then $\sigma_i:=\sigma^{2}*s_i$. If $s_i=0$ then $\sigma_i:=\sigma^{2}$. After a new set of parameters $\vec{\lambda}^{t}$ has been drawn, the sampling process decides if this new set of parameters is accepted according to the distribution $P(\vec{\lambda}^{t})$ that we want to sample from. Specifically, the parameters are accepted, iff.
\[ \alpha < \frac{ P(\vec{\lambda}^{t})Q(\vec{\lambda}^{t-1} | \vec{\lambda}^{t}) }{P(\vec{\lambda}^{t-1}) Q(\vec{\lambda}^{t} | \vec{\lambda}^{t-1})},\]
where $\alpha$ is drawn from a uniform distribution in $[0,1]$, i.e. Random.nextDouble(). Otherwise, the parameters are rejected and $\vec{\lambda}^{t}:=\vec{\lambda}^{t-1}$. Since the Gaussian distribution is symmetric around its mean, $Q(\vec{\lambda}^{t-1} | \vec{\lambda}^{t})=Q(\vec{\lambda}^{t} | \vec{\lambda}^{t-1})$, both terms cancel, and the acceptance probability depends only on the current and previous values of $P$. All sampled parameters are stored to separate temporary files for each concurrent sampling run by an internal SamplingComponent. The contents of these files are stored together with the remaining representation of the SamplingScoreBasedClassifier, if AbstractClassifier.toXML() is called, and, hence, can be stored to a monolithic file containing all information for, e.g., later classification procedures. For determining the length of the burn-in phase and, as a consequence, the beginning of the stationary phase, a BurnInTest can be provided to the constructor of the classifier.

Author:
Jan Grau

Nested Class Summary
protected  class SamplingScoreBasedClassifier.DiffSMSamplingComponent
          The SamplingComponent that handles storing and loading sampled parameters values to and from files.
static class SamplingScoreBasedClassifier.SamplingScheme
          Sampling scheme for sampling the parameters of the scoring functions.
 
Nested classes/interfaces inherited from class de.jstacs.classifiers.AbstractScoreBasedClassifier
AbstractScoreBasedClassifier.DoubleTableResult
 
Field Summary
protected  BurnInTest burnInTest
          The BurnInTest, may be null for no test
protected  double[] currentParameters
          the currently accepted parameters
protected  double currentScore
          The score achieved using currentParameters
protected  double[] initParameters
          The initial parameters if set by setInitParameters(double[]), null otherwise
protected  double[][] lastParameters
          The last accepted parameters for all samplings, backup for iterative sampling when checking for BurnInTest
protected  double[] lastScore
          The scores yielded for the parameters in lastParameters
protected  SamplingScoreBasedClassifierParameterSet params
          Parameters
protected  double[] previousParameters
          The previously accepted parameters, backup for rollbacks
protected  SamplingDifferentiableStatisticalModel[] scoringFunctions
          SamplingDifferentiableStatisticalModels
 
Constructor Summary
protected SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params, BurnInTest burnInTest, double[] classVariances, SamplingDifferentiableStatisticalModel... scoringFunctions)
          Creates a new SamplingScoreBasedClassifier using the parameters in params, a specified BurnInTest (or null for no burn-in test), a set of sampling variances, which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution), and set set of SamplingDifferentiableStatisticalModels for each of the classes.
  SamplingScoreBasedClassifier(StringBuffer xml)
          This is the constructor for Storable.
 
Method Summary
protected  double doOneSamplingStep(DiffSSBasedOptimizableFunction function, SamplingScoreBasedClassifier.SamplingScheme scheme, double previousValue)
          Performs one sampling step, i.e., one sampling of all parameter values.
 void doSingleSampling(DataSet[] s, double[][] weights, int numSteps, String outfilePrefix)
          Does a single sampling run for a predefined number of steps.
protected  void extractFurtherClassifierInfosFromXML(StringBuffer xml)
          Extracts further information of a classifier from an XML representation.
protected  double[] getBestParameters()
          Returns the sampled parameter values with the maximum value of the objective function
 CategoricalResult[] getClassifierAnnotation()
          Returns an array of Results of dimension AbstractClassifier.getNumberOfClasses() that contains information about the classifier and for each class.

res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...
 boolean getDeleteOnExit()
          Returns true if the temporary parameter files shall be deleted on exit of the program.
protected abstract  DiffSSBasedOptimizableFunction getFunction(DataSet[] data, double[][] weights)
          Returns the function that should be sampled from.
protected  StringBuffer getFurtherClassifierInfos()
          This method returns further information of a classifier as a StringBuffer.
 String getInstanceName()
          Returns a short description of the classifier.
protected  double[] getMeanParameters(boolean testBurnIn, int minBurnInSteps)
          Returns the mean parameters over all samplings of all stationary phases.
 NumericalResultSet getNumericalCharacteristics()
          Returns the subset of numerical values that are also returned by AbstractClassifier.getCharacteristics().
protected  SamplingScoreBasedClassifier.DiffSMSamplingComponent getSamplingComponent()
          Returns a sampling component suited for this SamplingScoreBasedClassifier
protected  double getScore(Sequence seq, int cls, boolean check)
          This method returns the score for a given Sequence and a given class.
 double[] getScores(DataSet s)
          This method returns the scores of the classifier for any Sequence in the DataSet.
 File getTempDir()
          Returns the directory for parameter files set in this SamplingScoreBasedClassifier.
protected  void init(int starts, boolean adaptVariance, String outfilePrefix)
          Initializes all internal fields and initializes the scoringFunctionss randomly
 boolean isInitialized()
          This method gives information about the state of the classifier.
 void joinAndSetParameterFiles(boolean add, File... files)
          Combines parameter files such that they are accepted as parameter files of this SamplingScoreBasedClassifier
protected  double modifyFunctionValue(double value)
          Allows for a modification of the value returned by the function obtained by getFunction(DataSet[], double[][]).
protected  void precomputeBurnInLength(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc)
          Precomputes the length of the burn-in phase, e.g.
protected  void sample(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc, DiffSSBasedOptimizableFunction function)
          Samples as many steps as needed to get into the stationary phase according to burnInTest and then samples the number of stationary steps as set in params.
protected  double sampleNSteps(DiffSSBasedOptimizableFunction function, SamplingScoreBasedClassifier.DiffSMSamplingComponent component, BurnInTest test, int numSteps, SamplingScoreBasedClassifier.SamplingScheme scheme)
          Samples a predefined number of steps appended to the current sampling
 void setDeleteOnExit(boolean deleteOnExit)
          If set to true (which is the default), the temporary files for storing sampled parameter values are deleted on exit of the program.
 void setInitParameters(double[] parameters)
          Sets the initial parameters of the sampling to parameters.
 void setTempDir(File tempDir)
          Sets the directory for parameter files set in this SamplingScoreBasedClassifier.
 void train(DataSet[] s, double[][] weights)
          This method trains a classifier over an array of weighted DataSet s.
 
Methods inherited from class de.jstacs.classifiers.AbstractScoreBasedClassifier
check, check, classify, classify, clone, createDefaultClassWeights, getClassWeight, getClassWeights, getMultiClassScores, getNumberOfClasses, getPValue, getPValue, getResults, getScore, setClassWeights, setClassWeights, setThresholdClassWeights
 
Methods inherited from class de.jstacs.classifiers.AbstractClassifier
classify, evaluate, evaluate, getAlphabetContainer, getCharacteristics, getLength, getXMLTag, toXML, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

params

protected SamplingScoreBasedClassifierParameterSet params
Parameters


scoringFunctions

protected SamplingDifferentiableStatisticalModel[] scoringFunctions
SamplingDifferentiableStatisticalModels


currentParameters

protected double[] currentParameters
the currently accepted parameters


initParameters

protected double[] initParameters
The initial parameters if set by setInitParameters(double[]), null otherwise


currentScore

protected double currentScore
The score achieved using currentParameters


previousParameters

protected double[] previousParameters
The previously accepted parameters, backup for rollbacks


lastParameters

protected double[][] lastParameters
The last accepted parameters for all samplings, backup for iterative sampling when checking for BurnInTest


lastScore

protected double[] lastScore
The scores yielded for the parameters in lastParameters


burnInTest

protected BurnInTest burnInTest
The BurnInTest, may be null for no test

Constructor Detail

SamplingScoreBasedClassifier

public SamplingScoreBasedClassifier(StringBuffer xml)
                             throws NonParsableException
This is the constructor for Storable.

Parameters:
xml - the xml representation
Throws:
NonParsableException - if the representation could not be parsed.

SamplingScoreBasedClassifier

protected SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params,
                                       BurnInTest burnInTest,
                                       double[] classVariances,
                                       SamplingDifferentiableStatisticalModel... scoringFunctions)
                                throws CloneNotSupportedException
Creates a new SamplingScoreBasedClassifier using the parameters in params, a specified BurnInTest (or null for no burn-in test), a set of sampling variances, which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution), and set set of SamplingDifferentiableStatisticalModels for each of the classes.

Parameters:
params - the external parameters of this classifier
burnInTest - the burn-in test (or null for no burn-in test)
classVariances - the variances used for sampling for the parameters of each class
scoringFunctions - the scoring functions for each of the classes
Throws:
CloneNotSupportedException - if the scoring functions or the burn-in test could not be cloned
See Also:
VarianceRatioBurnInTest
Method Detail

getFurtherClassifierInfos

protected StringBuffer getFurtherClassifierInfos()
Description copied from class: AbstractClassifier
This method returns further information of a classifier as a StringBuffer. This method is used by the method AbstractClassifier.toXML() and should not be made public.

Overrides:
getFurtherClassifierInfos in class AbstractScoreBasedClassifier
Returns:
further information of a classifier as a StringBuffer
See Also:
AbstractClassifier.toXML()

extractFurtherClassifierInfosFromXML

protected void extractFurtherClassifierInfosFromXML(StringBuffer xml)
                                             throws NonParsableException
Description copied from class: AbstractClassifier
Extracts further information of a classifier from an XML representation. This method is used by the method AbstractClassifier.fromXML(StringBuffer) and should not be made public.

Overrides:
extractFurtherClassifierInfosFromXML in class AbstractScoreBasedClassifier
Parameters:
xml - the XML representation as StringBuffer
Throws:
NonParsableException - if the information could not be parsed out of the XML representation (the StringBuffer could not be parsed)
See Also:
AbstractClassifier.fromXML(StringBuffer)

getClassifierAnnotation

public CategoricalResult[] getClassifierAnnotation()
Description copied from class: AbstractClassifier
Returns an array of Results of dimension AbstractClassifier.getNumberOfClasses() that contains information about the classifier and for each class.

res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...

Specified by:
getClassifierAnnotation in class AbstractClassifier
Returns:
an array of Results that contains information about the classifier

getNumericalCharacteristics

public NumericalResultSet getNumericalCharacteristics()
                                               throws Exception
Description copied from class: AbstractClassifier
Returns the subset of numerical values that are also returned by AbstractClassifier.getCharacteristics().

Specified by:
getNumericalCharacteristics in class AbstractClassifier
Returns:
the numerical characteristics
Throws:
Exception - if some of the characteristics could not be defined

getInstanceName

public String getInstanceName()
Description copied from class: AbstractClassifier
Returns a short description of the classifier.

Specified by:
getInstanceName in class AbstractClassifier
Returns:
a short description of the classifier

getFunction

protected abstract DiffSSBasedOptimizableFunction getFunction(DataSet[] data,
                                                              double[][] weights)
                                                       throws Exception
Returns the function that should be sampled from.

Parameters:
data - the samples
weights - the weights of the sequences of the samples
Returns:
the function that should be sampled from
Throws:
Exception - if the function could not be created

modifyFunctionValue

protected double modifyFunctionValue(double value)
Allows for a modification of the value returned by the function obtained by getFunction(DataSet[], double[][]). This is for instance necessary in case of LogGenDisMixFunction to obtain a proper posterior or supervised posterior.

Parameters:
value - the original value
Returns:
the modified value

getSamplingComponent

protected SamplingScoreBasedClassifier.DiffSMSamplingComponent getSamplingComponent()
Returns a sampling component suited for this SamplingScoreBasedClassifier

Returns:
the sampling component

getTempDir

public File getTempDir()
Returns the directory for parameter files set in this SamplingScoreBasedClassifier. If this value is null, the default directory of the executing OS is used for the parameter files.

Returns:
the temp directory

setTempDir

public void setTempDir(File tempDir)
Sets the directory for parameter files set in this SamplingScoreBasedClassifier. If tempDir is null, the default directory of the executing OS is used for the parameter files. If this value is reset after training, all sampled parameters will be lost. The value set by this method is not stored in the XML-representation.

Parameters:
tempDir - the temp directory

getDeleteOnExit

public boolean getDeleteOnExit()
Returns true if the temporary parameter files shall be deleted on exit of the program.

Returns:
if temp files are deleted

setDeleteOnExit

public void setDeleteOnExit(boolean deleteOnExit)
                     throws Exception
If set to true (which is the default), the temporary files for storing sampled parameter values are deleted on exit of the program. If this value is set to true it cannot be reset to false, again, after sampling started due to the restrictions of File.deleteOnExit(). If you want to retain those parameters, nonetheless, you can call AbstractClassifier.toXML() and save this StringBuffer, which also contains the sampled parameter values, somewhere. The value set by this method is not stored in the XML-representation.

Parameters:
deleteOnExit - if temp files shall be deleted on exit
Throws:
Exception - if set to false after sampling started

init

protected void init(int starts,
                    boolean adaptVariance,
                    String outfilePrefix)
             throws Exception
Initializes all internal fields and initializes the scoringFunctionss randomly

Parameters:
starts - number of starts
adaptVariance - if true, variance is adapted to size of event space
outfilePrefix - the prefix of the outfiles
Throws:
Exception - if the scoring functions could not be initialized

sampleNSteps

protected double sampleNSteps(DiffSSBasedOptimizableFunction function,
                              SamplingScoreBasedClassifier.DiffSMSamplingComponent component,
                              BurnInTest test,
                              int numSteps,
                              SamplingScoreBasedClassifier.SamplingScheme scheme)
                       throws Exception
Samples a predefined number of steps appended to the current sampling

Parameters:
function - the objective function
component - the sampling component with selected sampling
test - the burn-in test
numSteps - the number of steps
scheme - the SamplingScoreBasedClassifier.SamplingScheme
Returns:
the value of the last accepted parameters
Throws:
Exception - if either the function could not be evaluated on the current parameters or the sampled parameters could not be stored

sample

protected void sample(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc,
                      DiffSSBasedOptimizableFunction function)
               throws Exception
Samples as many steps as needed to get into the stationary phase according to burnInTest and then samples the number of stationary steps as set in params.

Parameters:
sfsc - the current sampling component
function - the objective function
Throws:
Exception - if the sampling could not be extended, e.g. due to evaluation errors

doOneSamplingStep

protected double doOneSamplingStep(DiffSSBasedOptimizableFunction function,
                                   SamplingScoreBasedClassifier.SamplingScheme scheme,
                                   double previousValue)
                            throws Exception
Performs one sampling step, i.e., one sampling of all parameter values.

Parameters:
function - the objective function
scheme - the SamplingScoreBasedClassifier.SamplingScheme
previousValue - the value of the last sampling or minus infinity for the first sampling run
Returns:
the value of the last accepted parameter values or Double.NaN if none of the sampled parameters where accepted
Throws:
Exception - if the function could not be evaluated or an unknown SamplingScoreBasedClassifier.SamplingScheme was provided

getScore

protected double getScore(Sequence seq,
                          int cls,
                          boolean check)
                   throws IllegalArgumentException,
                          NotTrainedException,
                          Exception
Description copied from class: AbstractScoreBasedClassifier
This method returns the score for a given Sequence and a given class.

Specified by:
getScore in class AbstractScoreBasedClassifier
Parameters:
seq - the Sequence
cls - the index of the class
check - the switch to decide whether to check AlphabetContainer and the length of the Sequence or not
Returns:
the score for a given Sequence and a given class
Throws:
IllegalArgumentException - if something is wrong with the Sequence seq
NotTrainedException - if the classifier is not trained
Exception - if something went wrong

getScores

public double[] getScores(DataSet s)
                   throws Exception
Description copied from class: AbstractScoreBasedClassifier
This method returns the scores of the classifier for any Sequence in the DataSet. The scores are stored in the array according to the index of the Sequence in the DataSet.

Only for 2-class-classifiers.

Overrides:
getScores in class AbstractScoreBasedClassifier
Parameters:
s - the DataSet
Returns:
the array of scores
Throws:
Exception - if something went wrong

setInitParameters

public void setInitParameters(double[] parameters)
Sets the initial parameters of the sampling to parameters.

Parameters:
parameters - the initial parameters

isInitialized

public boolean isInitialized()
Description copied from class: AbstractClassifier
This method gives information about the state of the classifier.

Specified by:
isInitialized in class AbstractClassifier
Returns:
true if the classifier is initialized and therefore able to classify sequences, otherwise false

doSingleSampling

public void doSingleSampling(DataSet[] s,
                             double[][] weights,
                             int numSteps,
                             String outfilePrefix)
                      throws Exception
Does a single sampling run for a predefined number of steps.

Parameters:
s - the data
weights - the weights for the data
numSteps - the number of sampling steps
outfilePrefix - the prefix of the outfile where the parameter values are stored
Throws:
Exception - if the scoring functions could not be initialized or the sampling could not be extended, e.g. due to evaluation errors

train

public void train(DataSet[] s,
                  double[][] weights)
           throws Exception
Description copied from class: AbstractClassifier
This method trains a classifier over an array of weighted DataSet s. That is why the following has to be fulfilled: This method should work non-incrementally as the method AbstractClassifier.train(DataSet...).

This method should check that the DataSets are defined over the underlying alphabet and length.

Specified by:
train in class AbstractClassifier
Parameters:
s - an array of DataSets
weights - the weights for the DataSets
Throws:
Exception - if the weights are incorrect or the training did not succeed
See Also:
AbstractClassifier.train(DataSet...)

precomputeBurnInLength

protected void precomputeBurnInLength(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc)
                               throws Exception
Precomputes the length of the burn-in phase, e.g. useful for computing scores of multiple sequences

Parameters:
sfsc - the current sampling component
Throws:
Exception - if the parameters values could not be parsed

getBestParameters

protected double[] getBestParameters()
                              throws Exception
Returns the sampled parameter values with the maximum value of the objective function

Returns:
the best parameters
Throws:
Exception - if the parameters values could not be parsed

getMeanParameters

protected double[] getMeanParameters(boolean testBurnIn,
                                     int minBurnInSteps)
                              throws Exception
Returns the mean parameters over all samplings of all stationary phases.

Parameters:
testBurnIn - true if the length of the burn-in phase shall be computed
minBurnInSteps - minimum number of steps considered as burn-in
Returns:
the mean parameters
Throws:
Exception - if the parameters values could not be parsed

joinAndSetParameterFiles

public void joinAndSetParameterFiles(boolean add,
                                     File... files)
                              throws Exception
Combines parameter files such that they are accepted as parameter files of this SamplingScoreBasedClassifier

Parameters:
add - if true, parameter files are appended to the current ones, i.e., the number of samplings is augmented by these files
files - the parameter files
Throws:
Exception - if the parameter files could not be joined