public abstract class SamplingScoreBasedClassifier extends AbstractScoreBasedClassifier
SamplingDifferentiableStatisticalModels by the Metropolis-Hastings algorithm.
The distribution the parameters are sampled from is the distribution
represented by the DiffSSBasedOptimizableFunction returned by
getFunction(DataSet[], double[][]). As proposal distribution, a Gaussian distribution with given sampling
variance is used for each parameter.
Specifically, a new set of parameters
is drawn from a proposal distribution
,
where
![\[ Q(\vec{\lambda}^{t}|\vec{\lambda}^{t-1}) = \prod_{i} \mathcal{N}(\lambda_i^{t}|\lambda_i^{t-1},\sigma_i^2)\]](images/SamplingScoreBasedClassifier_LaTeXilb5_1.png)
is the sampling variance for parameter
. The sampling variances are adapted to the
size of the event space of each parameter based on a class-dependent variance provided to the constructor. This adaption depends on the correct
implementation of DifferentiableStatisticalModel.getSizeOfEventSpaceForRandomVariablesOfParameter(int). Let
be the size of the event space
of the random variable of parameter
, and let
be the class-dependent variance for the SamplingDifferentiableStatisticalModel
that
is a parameter of. Then
. If
then
.
After a new set of parameters
has been drawn, the sampling process decides if this new set of parameters is accepted
according to the distribution
that we want to sample from.
Specifically, the parameters are accepted, iff.
![\[ \alpha < \frac{ P(\vec{\lambda}^{t})Q(\vec{\lambda}^{t-1} | \vec{\lambda}^{t}) }{P(\vec{\lambda}^{t-1}) Q(\vec{\lambda}^{t} | \vec{\lambda}^{t-1})},\]](images/SamplingScoreBasedClassifier_LaTeXilb6_1.png)
is drawn from a uniform distribution in
, i.e. Random.nextDouble().
Otherwise, the parameters are rejected and
.
Since the Gaussian distribution is symmetric around its mean,
, both terms
cancel, and the acceptance probability depends only on the current and previous values of
.
All sampled parameters are stored to separate temporary files for each concurrent sampling run by an internal SamplingComponent. The contents of these files
are stored together with the remaining representation of the SamplingScoreBasedClassifier, if AbstractClassifier.toXML() is called, and, hence,
can be stored to a monolithic file containing all information for, e.g., later classification procedures.
For determining the length of the burn-in phase and, as a consequence, the beginning of the stationary phase, a BurnInTest can be provided to the constructor of the classifier.| Modifier and Type | Class and Description |
|---|---|
protected class |
SamplingScoreBasedClassifier.DiffSMSamplingComponent
The
SamplingComponent that handles storing and loading sampled parameters values
to and from files. |
static class |
SamplingScoreBasedClassifier.SamplingScheme
Sampling scheme for sampling the parameters of the scoring functions.
|
AbstractScoreBasedClassifier.DoubleTableResult| Modifier and Type | Field and Description |
|---|---|
protected BurnInTest |
burnInTest
The
BurnInTest, may be null for no test |
protected double[] |
currentParameters
the currently accepted parameters
|
protected double |
currentScore
The score achieved using
currentParameters |
protected double[] |
initParameters
The initial parameters if set by
setInitParameters(double[]), null otherwise |
protected double[][] |
lastParameters
The last accepted parameters for all samplings, backup for iterative
sampling when checking for
BurnInTest |
protected double[] |
lastScore
The scores yielded for the parameters in
lastParameters |
protected SamplingScoreBasedClassifierParameterSet |
params
Parameters
|
protected double[] |
previousParameters
The previously accepted parameters, backup for rollbacks
|
protected SamplingDifferentiableStatisticalModel[] |
scoringFunctions
|
| Modifier | Constructor and Description |
|---|---|
protected |
SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params,
BurnInTest burnInTest,
double[] classVariances,
SamplingDifferentiableStatisticalModel... scoringFunctions)
Creates a new
SamplingScoreBasedClassifier using the parameters in params,
a specified BurnInTest (or null for no burn-in test), a set of sampling variances,
which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution),
and set set of SamplingDifferentiableStatisticalModels for each of the classes. |
|
SamplingScoreBasedClassifier(StringBuffer xml)
This is the constructor for
Storable. |
| Modifier and Type | Method and Description |
|---|---|
protected double |
doOneSamplingStep(DiffSSBasedOptimizableFunction function,
SamplingScoreBasedClassifier.SamplingScheme scheme,
double previousValue)
Performs one sampling step, i.e., one sampling of all parameter values.
|
void |
doSingleSampling(DataSet[] s,
double[][] weights,
int numSteps,
String outfilePrefix)
Does a single sampling run for a predefined number of steps.
|
protected void |
extractFurtherClassifierInfosFromXML(StringBuffer xml)
Extracts further information of a classifier from an XML representation.
|
protected double[] |
getBestParameters()
Returns the sampled parameter values with the maximum value of the objective function
|
CategoricalResult[] |
getClassifierAnnotation()
Returns an array of
Results of dimension
AbstractClassifier.getNumberOfClasses() that contains information about the
classifier and for each class.
res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() ); |
boolean |
getDeleteOnExit()
Returns
true if the temporary parameter files shall
be deleted on exit of the program. |
protected abstract DiffSSBasedOptimizableFunction |
getFunction(DataSet[] data,
double[][] weights)
Returns the function that should be sampled from.
|
protected StringBuffer |
getFurtherClassifierInfos()
This method returns further information of a classifier as a
StringBuffer. |
String |
getInstanceName()
Returns a short description of the classifier.
|
protected double[] |
getMeanParameters(boolean testBurnIn,
int minBurnInSteps)
Returns the mean parameters over all samplings of all stationary phases.
|
NumericalResultSet |
getNumericalCharacteristics()
Returns the subset of numerical values that are also returned by
AbstractClassifier.getCharacteristics(). |
protected SamplingScoreBasedClassifier.DiffSMSamplingComponent |
getSamplingComponent()
Returns a sampling component suited for this
SamplingScoreBasedClassifier |
protected double |
getScore(Sequence seq,
int cls,
boolean check)
This method returns the score for a given
Sequence and a given
class. |
double[] |
getScores(DataSet s)
|
File |
getTempDir()
Returns the directory for parameter files set in this
SamplingScoreBasedClassifier. |
protected void |
init(int starts,
boolean adaptVariance,
String outfilePrefix)
Initializes all internal fields and initializes the
scoringFunctionss randomly |
boolean |
isInitialized()
This method gives information about the state of the classifier.
|
void |
joinAndSetParameterFiles(boolean add,
File... files)
Combines parameter files such that they are accepted as parameter files
of this
SamplingScoreBasedClassifier |
protected double |
modifyFunctionValue(double value)
Allows for a modification of the value returned by the function
obtained by
getFunction(DataSet[], double[][]). |
protected void |
precomputeBurnInLength(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc)
Precomputes the length of the burn-in phase, e.g.
|
protected void |
sample(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc,
DiffSSBasedOptimizableFunction function)
Samples as many steps as needed to get into the stationary phase according to
burnInTest and then samples the number of
stationary steps as set in params. |
protected double |
sampleNSteps(DiffSSBasedOptimizableFunction function,
SamplingScoreBasedClassifier.DiffSMSamplingComponent component,
BurnInTest test,
int numSteps,
SamplingScoreBasedClassifier.SamplingScheme scheme)
Samples a predefined number of steps appended to the current sampling
|
void |
setDeleteOnExit(boolean deleteOnExit)
If set to
true (which is the default), the temporary files for storing sampled parameter
values are deleted on exit of the program. |
void |
setInitParameters(double[] parameters)
Sets the initial parameters of the sampling to
parameters. |
void |
setTempDir(File tempDir)
Sets the directory for parameter files set in this
SamplingScoreBasedClassifier. |
void |
train(DataSet[] s,
double[][] weights)
This method trains a classifier over an array of weighted
DataSet
s. |
check, check, classify, classify, clone, createDefaultClassWeights, getClassWeight, getClassWeights, getMultiClassScores, getNumberOfClasses, getPValue, getPValue, getResults, getScore, setClassWeights, setClassWeights, setThresholdClassWeightsclassify, evaluate, evaluate, getAlphabetContainer, getCharacteristics, getLength, getXMLTag, toXML, trainprotected SamplingScoreBasedClassifierParameterSet params
protected SamplingDifferentiableStatisticalModel[] scoringFunctions
protected double[] currentParameters
protected double[] initParameters
setInitParameters(double[]), null otherwiseprotected double currentScore
currentParametersprotected double[] previousParameters
protected double[][] lastParameters
BurnInTestprotected double[] lastScore
lastParametersprotected BurnInTest burnInTest
BurnInTest, may be null for no testpublic SamplingScoreBasedClassifier(StringBuffer xml) throws NonParsableException
Storable.xml - the xml representationNonParsableException - if the representation could not be parsed.protected SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params, BurnInTest burnInTest, double[] classVariances, SamplingDifferentiableStatisticalModel... scoringFunctions) throws CloneNotSupportedException
SamplingScoreBasedClassifier using the parameters in params,
a specified BurnInTest (or null for no burn-in test), a set of sampling variances,
which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution),
and set set of SamplingDifferentiableStatisticalModels for each of the classes.params - the external parameters of this classifierburnInTest - the burn-in test (or null for no burn-in test)classVariances - the variances used for sampling for the parameters of each classscoringFunctions - the scoring functions for each of the classesCloneNotSupportedException - if the scoring functions or the burn-in test could not be clonedVarianceRatioBurnInTestprotected StringBuffer getFurtherClassifierInfos()
AbstractClassifierStringBuffer. This method is used by the method AbstractClassifier.toXML()
and should not be made public.getFurtherClassifierInfos in class AbstractScoreBasedClassifierStringBufferAbstractClassifier.toXML()protected void extractFurtherClassifierInfosFromXML(StringBuffer xml) throws NonParsableException
AbstractClassifierAbstractClassifier.fromXML(StringBuffer) and
should not be made public.extractFurtherClassifierInfosFromXML in class AbstractScoreBasedClassifierxml - the XML representation as StringBufferNonParsableException - if the information could not be parsed out of the XML
representation (the StringBuffer could not be parsed)AbstractClassifier.fromXML(StringBuffer)public CategoricalResult[] getClassifierAnnotation()
AbstractClassifierResults of dimension
AbstractClassifier.getNumberOfClasses() that contains information about the
classifier and for each class.
res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...
getClassifierAnnotation in class AbstractClassifierResults that contains information about the
classifierpublic NumericalResultSet getNumericalCharacteristics() throws Exception
AbstractClassifierAbstractClassifier.getCharacteristics().getNumericalCharacteristics in class AbstractClassifierException - if some of the characteristics could not be definedpublic String getInstanceName()
AbstractClassifiergetInstanceName in class AbstractClassifierprotected abstract DiffSSBasedOptimizableFunction getFunction(DataSet[] data, double[][] weights) throws Exception
data - the samplesweights - the weights of the sequences of the samplesException - if the function could not be createdprotected double modifyFunctionValue(double value)
getFunction(DataSet[], double[][]).
This is for instance necessary in case of LogGenDisMixFunction to
obtain a proper posterior or supervised posterior.value - the original valueprotected SamplingScoreBasedClassifier.DiffSMSamplingComponent getSamplingComponent()
SamplingScoreBasedClassifierpublic File getTempDir()
SamplingScoreBasedClassifier.
If this value is null, the default directory of the executing OS is used for the parameter
files.public void setTempDir(File tempDir)
SamplingScoreBasedClassifier.
If tempDir is null, the default directory of the executing OS is used for the parameter
files. If this value is reset after training, all sampled parameters will be lost.
The value set by this method is not stored in the XML-representation.tempDir - the temp directorypublic boolean getDeleteOnExit()
true if the temporary parameter files shall
be deleted on exit of the program.public void setDeleteOnExit(boolean deleteOnExit)
throws Exception
true (which is the default), the temporary files for storing sampled parameter
values are deleted on exit of the program. If this value is set to true it cannot be
reset to false, again, after sampling started due to the restrictions of File.deleteOnExit().
If you want to retain those
parameters, nonetheless, you can call AbstractClassifier.toXML()
and save this StringBuffer, which also contains the sampled
parameter values, somewhere.
The value set by this method is not stored in the XML-representation.deleteOnExit - if temp files shall be deleted on exitException - if set to false after sampling startedprotected void init(int starts,
boolean adaptVariance,
String outfilePrefix)
throws Exception
scoringFunctionss randomlystarts - number of startsadaptVariance - if true, variance is adapted to size of event spaceoutfilePrefix - the prefix of the outfilesException - if the scoring functions could not be initializedprotected double sampleNSteps(DiffSSBasedOptimizableFunction function, SamplingScoreBasedClassifier.DiffSMSamplingComponent component, BurnInTest test, int numSteps, SamplingScoreBasedClassifier.SamplingScheme scheme) throws Exception
function - the objective functioncomponent - the sampling component with selected samplingtest - the burn-in testnumSteps - the number of stepsscheme - the SamplingScoreBasedClassifier.SamplingSchemeException - if either the function could not be evaluated on the current parameters or the
sampled parameters could not be storedprotected void sample(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc, DiffSSBasedOptimizableFunction function) throws Exception
burnInTest and then samples the number of
stationary steps as set in params.sfsc - the current sampling componentfunction - the objective functionException - if the sampling could not be extended, e.g. due to evaluation errorsprotected double doOneSamplingStep(DiffSSBasedOptimizableFunction function, SamplingScoreBasedClassifier.SamplingScheme scheme, double previousValue) throws Exception
function - the objective functionscheme - the SamplingScoreBasedClassifier.SamplingSchemepreviousValue - the value of the last sampling or minus infinity
for the first sampling runDouble.NaN if none
of the sampled parameters where acceptedException - if the function could not be evaluated or an unknown SamplingScoreBasedClassifier.SamplingScheme was providedprotected double getScore(Sequence seq, int cls, boolean check) throws IllegalArgumentException, NotTrainedException, Exception
AbstractScoreBasedClassifierSequence and a given
class.getScore in class AbstractScoreBasedClassifierseq - the Sequencecls - the index of the classcheck - the switch to decide whether to check
AlphabetContainer and the length of the
Sequence or notSequence and a given classIllegalArgumentException - if something is wrong with the Sequence
seqNotTrainedException - if the classifier is not trainedException - if something went wrongpublic double[] getScores(DataSet s) throws Exception
AbstractScoreBasedClassifierSequence
in the DataSet. The scores are stored in the array according to
the index of the Sequence in the DataSet.
getScores in class AbstractScoreBasedClassifiers - the DataSetException - if something went wrongpublic void setInitParameters(double[] parameters)
parameters.parameters - the initial parameterspublic boolean isInitialized()
AbstractClassifierisInitialized in class AbstractClassifiertrue if the classifier is initialized and therefore able
to classify sequences, otherwise falsepublic void doSingleSampling(DataSet[] s, double[][] weights, int numSteps, String outfilePrefix) throws Exception
s - the dataweights - the weights for the datanumSteps - the number of sampling stepsoutfilePrefix - the prefix of the outfile where the parameter values
are storedException - if the scoring functions could not be initialized or the sampling could not be extended, e.g. due to evaluation errorspublic void train(DataSet[] s, double[][] weights) throws Exception
AbstractClassifierDataSet
s. That is why the following has to be fulfilled:
s.length == weights.length
weights[i] == null || s[i].getNumberOfElements() == weights[i].length.
AbstractClassifier.train(DataSet...).
DataSets are defined over the
underlying alphabet and length.train in class AbstractClassifiers - an array of DataSetsweights - the weights for the DataSetsException - if the weights are incorrect or the training did not succeedAbstractClassifier.train(DataSet...)protected void precomputeBurnInLength(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc) throws Exception
sfsc - the current sampling componentException - if the parameters values could not be parsedprotected double[] getBestParameters()
throws Exception
Exception - if the parameters values could not be parsedprotected double[] getMeanParameters(boolean testBurnIn,
int minBurnInSteps)
throws Exception
testBurnIn - true if the length of the burn-in phase shall be computedminBurnInSteps - minimum number of steps considered as burn-inException - if the parameters values could not be parsedpublic void joinAndSetParameterFiles(boolean add,
File... files)
throws Exception
SamplingScoreBasedClassifieradd - if true, parameter files are appended to the current ones, i.e., the number
of samplings is augmented by these filesfiles - the parameter filesException - if the parameter files could not be joined