de.jstacs.classifier.assessment
Class ClassifierAssessment

java.lang.Object
  extended by de.jstacs.classifier.assessment.ClassifierAssessment
Direct Known Subclasses:
KFoldCrossValidation, RepeatedHoldOutExperiment, RepeatedSubSamplingExperiment, Sampled_RepeatedHoldOutExperiment

public abstract class ClassifierAssessment
extends Object

Class defining an assessment of classifiers.
It should be used as a super-class for specialized classifier-assessments like k-fold-crossvalidation or subsampling-crossvalidation.
Several standard tasks like classifier- or model-management (testing, training) are implemented. The method assess should be used the standard-method to start a classifier-assessment. Subclasses have to implement the method evaluateClassifier(). This method mainly has to execute the construction of test- and training-subsets of the given data. These test- and training-subsets may be used by the methods test() and train() which already are implemented in a standard way.

Author:
andr|e gohr (a0coder (nospam:@) gmail (nospam:.) com), Jens Keilwagen

Field Summary
protected  AbstractClassifier[] myAbstractClassifier
          This array contains the internal used classifiers.
protected  boolean myBuildClassifierByCrossProduct
           
protected  Model[][] myModel
          This array contains for each class the internal used models.
protected  MeanResultSet[] myTempMeanResultSets
           
protected  int skipLastClassifiersDuringClassifierTraining
           
 
Constructor Summary
  ClassifierAssessment(AbstractClassifier... aCs)
          Creates a new ClassifierAssessment from a set of AbstractClassifiers.
  ClassifierAssessment(AbstractClassifier[] aCs, boolean buildClassifiersByCrossProduct, Model[]... aMs)
          This constructor allows to assess a collection of given AbstractClassifiers and those constructed using the given AbstractModels.
protected ClassifierAssessment(AbstractClassifier[] aCs, Model[][] aMs, boolean buildClassifiersByCrossProduct, boolean checkAlphabetConsistencyAndLength)
          Creates a new ClassifierAssessment from an array of AbstractClassifiers and a two-dimensional array of Models, which are combined to additional classifiers.
  ClassifierAssessment(boolean buildClassifiersByCrossProduct, Model[]... aMs)
          Creates a new ClassifierAssessment from a set of Models.
 
Method Summary
 ListResult assess(MeasureParameters mp, ClassifierAssessmentAssessParameterSet assessPS, ProgressUpdater pU, Sample... s)
          Assesses the contained classifiers.
 ListResult assess(MeasureParameters mp, ClassifierAssessmentAssessParameterSet assessPS, ProgressUpdater pU, Sample[][]... s)
           
 ListResult assess(MeasureParameters mp, ClassifierAssessmentAssessParameterSet assessPS, Sample... s)
          Assesses the contained classifiers.
protected abstract  boolean evaluateClassifier(MeasureParameters mp, ClassifierAssessmentAssessParameterSet assessPS, Sample[] s, ProgressUpdater pU)
          This method must be implemented in all subclasses.
 AbstractClassifier[] getClassifier()
          Returns a deep copy of all classifiers that have been or will be used in this assessment.
 String getNameOfAssessment()
           
protected  void prepareAssessment(Sample... s)
          Prepares an assessment.
protected  void test(MeasureParameters mp, boolean exception, Sample... testS)
          Uses the given test-Samples to call the evaluate-methods of the local AbstractClassifiers.
protected  void train(Sample... trainS)
          Trains the local classifiers using the given trainingSamples.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

myAbstractClassifier

protected AbstractClassifier[] myAbstractClassifier
This array contains the internal used classifiers.


myModel

protected Model[][] myModel
This array contains for each class the internal used models. This is helpful to allow to train some models only once while evaluating them in combination with other models.


myTempMeanResultSets

protected MeanResultSet[] myTempMeanResultSets

skipLastClassifiersDuringClassifierTraining

protected int skipLastClassifiersDuringClassifierTraining

myBuildClassifierByCrossProduct

protected boolean myBuildClassifierByCrossProduct
Constructor Detail

ClassifierAssessment

protected ClassifierAssessment(AbstractClassifier[] aCs,
                               Model[][] aMs,
                               boolean buildClassifiersByCrossProduct,
                               boolean checkAlphabetConsistencyAndLength)
                        throws IllegalArgumentException,
                               WrongAlphabetException,
                               CloneNotSupportedException,
                               ClassDimensionException
Creates a new ClassifierAssessment from an array of AbstractClassifiers and a two-dimensional array of Models, which are combined to additional classifiers. If buildClassifiersByCrossProduct is true, the cross product of all Models in aMs is built to obtain these classifiers.

Parameters:
aCs - the pre-defined classifiers
aMs - the Models that are used to build additional classifiers
buildClassifiersByCrossProduct - Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],...,aMs[k][i] for a fixed i is constructed . In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
checkAlphabetConsistencyAndLength - indicates if alphabets and lengths shall be checked for consistency
Throws:
IllegalArgumentException
WrongAlphabetException
CloneNotSupportedException
ClassDimensionException

ClassifierAssessment

public ClassifierAssessment(AbstractClassifier... aCs)
                     throws IllegalArgumentException,
                            WrongAlphabetException,
                            CloneNotSupportedException,
                            ClassDimensionException
Creates a new ClassifierAssessment from a set of AbstractClassifiers.

Parameters:
aCs - contains the classifiers to be assessed
If model-based classifiers are trained, the order of models in classifiers determines, which model will be trained using which sample in method assess().
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
Throws:
IllegalArgumentException
WrongAlphabetException - if not all given classifiers are defined on the same AlphabetContainer
ClassDimensionException
CloneNotSupportedException

ClassifierAssessment

public ClassifierAssessment(boolean buildClassifiersByCrossProduct,
                            Model[]... aMs)
                     throws IllegalArgumentException,
                            WrongAlphabetException,
                            CloneNotSupportedException,
                            ClassDimensionException
Creates a new ClassifierAssessment from a set of Models. The argument buildClassifiersByCrossProduct determines how these Models are combined to classifiers.

Parameters:
buildClassifiersByCrossProduct -
Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],...,aMs[k][i] for a fixed i is constructed . In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
aMs -
Contains the models in the following way (suppose a k-class problem): the first dimension encodes the class (here it is k), the second dimension (aMs[i]) contains the models according to class i.
If models are trained directly (during assessment), the order of given models during initiation of this assessment-object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
Throws:
WrongAlphabetException - if not all given models are defines on the same AlphabetContainer
IllegalArgumentException
CloneNotSupportedException
ClassDimensionException

ClassifierAssessment

public ClassifierAssessment(AbstractClassifier[] aCs,
                            boolean buildClassifiersByCrossProduct,
                            Model[]... aMs)
                     throws IllegalArgumentException,
                            WrongAlphabetException,
                            CloneNotSupportedException,
                            ClassDimensionException
This constructor allows to assess a collection of given AbstractClassifiers and those constructed using the given AbstractModels.

Parameters:
aCs - contains some AbstractClassifier that should be assessed in addition to the AbstractClassifiers constructed using the given AbstractModels
buildClassifiersByCrossProduct -
Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],...,aMs[k][i] for a fixed i is constructed . In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
aMs -
Contains the models in the following way (suppose a k-class problem): the first dimension encodes the class (here it is k), the second dimension (aMs[i]) contains the models according to class i.
If models are trained directly (during assessment), the order of given models during initiation of this assessment-object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
Throws:
WrongAlphabetException - if not all given models are defines on the same AlphabetContainer
IllegalArgumentException
CloneNotSupportedException
ClassDimensionException
Method Detail

assess

public ListResult assess(MeasureParameters mp,
                         ClassifierAssessmentAssessParameterSet assessPS,
                         Sample... s)
                  throws IllegalArgumentException,
                         WrongAlphabetException,
                         Exception
Assesses the contained classifiers.

Parameters:
s - contains the data to be used for assessment. The order of samples is important.
If model-based classifiers are trained, the order of models in classifiers determines, which model will be trained using which sample. The first model in classifier will be trained using the first sample in s. If models are trained directly, the order of given models during initiation of this assessment-object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
mp - defines which performance-measure should be used to assess classifiers
assessPS - contains some parameters necessary for assessment (depends on the kind of assessment!)
Returns:
ListResult contains the results (mean and standard-errors) of user-specified performance-measures. These performance-measures are user specified via the given MeasureParameters.
Throws:
IllegalArgumentException - if given assessPS is not of right type (see method evaluateClassifier())
WrongAlphabetException - if given Samples s do not use the same AlphabetContainer as contained classifiers/models
Exception - forwarded from training/testing of classifiers/models

assess

public ListResult assess(MeasureParameters mp,
                         ClassifierAssessmentAssessParameterSet assessPS,
                         ProgressUpdater pU,
                         Sample... s)
                  throws IllegalArgumentException,
                         WrongAlphabetException,
                         Exception
Assesses the contained classifiers.

Parameters:
s - contains the data to be used for assessment. The order of samples is important.
If model-based classifiers are trained, the order of models in classifiers determines, which model will be trained using which sample. The first model in classifier will be trained using the first sample in s. If models are trained directly, the order of given models during initiation of this assessment-object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
mp - defines which performance-measure should be used to assess classifiers
assessPS - contains some parameters necessary for assessment (depends on the kind of assessment!)
pU - this ProgressUpdater may be used to cancel this method assess() by setting pU.isCandeled=true. In that case, assess will abort but return results already computed.
In certain cases aborting a classifier assessment will not be allowed for example in case of KFoldCrossValidation. In this case it might be wise to override this method such that it just returns an error-message.
pU is allowed to be null although in this case it may be more convenient to use the second method code not requiring a ProgressUpdater.
Returns:
ListResult contains the results (mean and standard-errors) of user-specified performance-measures. These performance-measures are user specified via the given MeasureParameters.
Throws:
IllegalArgumentException - if given assessPS is not of right type (see method evaluateClassifier())
WrongAlphabetException - if given Samples s do not use the same AlphabetContainer as contained classifiers/models
Exception - forwarded from training/testing of classifiers/models

assess

public ListResult assess(MeasureParameters mp,
                         ClassifierAssessmentAssessParameterSet assessPS,
                         ProgressUpdater pU,
                         Sample[][]... s)
                  throws IllegalArgumentException,
                         WrongAlphabetException,
                         Exception
Parameters:
s - Contains the data to be used for assessment.
The order of samples in s are important. s[iter][train/test][] -> the first dimension codes for which samples (train, test) are used in iteration iter.
The second dimension codes for training: s[iter][0] or test: s[iter][1]. s[iter][0] contains for each class a training-sample. Analog s[iter][1] contains the test-samples. The order of samples is important. For further details see comment of method assess(MeasureParameters, AssessParameterSet, Sample).
The user is responsible to take care or not to take care of the given test- and training-dataset to be not overlapping.
mp - defines which performance-measure should be used to assess classifiers
assessPS - Contains some parameters necessary for assessment. Must be of type ClassifierAssessmentAssessParameterSet
pU - This ProgressUpdater allows to abort this classifier assessment. If pU.isCalceled()is true, all results already computed will be returned. It is allowed to give a null-reference.
Returns:
ListResult contains the results (mean and standard-errors) of user-specified performance-measures. These performance-measures are user specified via the given MeasureParameters.
Throws:
IllegalArgumentException - if given assessPS is not of right type (see method evaluateClassifier())
WrongAlphabetException - if given Samples s do not use the same AlphabetContainer as contained classifiers/models
Exception - forwarded from training/testing of classifiers/models

getClassifier

public AbstractClassifier[] getClassifier()
                                   throws CloneNotSupportedException
Returns a deep copy of all classifiers that have been or will be used in this assessment.

Returns:
a deep copy of all used classifiers
Throws:
CloneNotSupportedException - if it is imppossible to get an deep copy for at least one classifier

getNameOfAssessment

public String getNameOfAssessment()
Returns:
name of this class

evaluateClassifier

protected abstract boolean evaluateClassifier(MeasureParameters mp,
                                              ClassifierAssessmentAssessParameterSet assessPS,
                                              Sample[] s,
                                              ProgressUpdater pU)
                                       throws IllegalArgumentException,
                                              Exception
This method must be implemented in all subclasses. It should perform the following tasks:
1.) create test- and train-datasets 2.) call method train to train classifiers/models using train-data 3.) call method test to cause evaluation (test) of trained classieres

Parameters:
mp - defines which performance-measures are used to assess classifiers
assessPS - containes assessment-specific parameters (like: number of iterations of a k-fold-crossvalidation)
s - data to be used for assessment (both: test- and train-data)
pU - a ProgressUpdater that mainly has to be used to allow the user to cancel a current running alssifier assessment. This ProgressUpdater is guaranteed to be not null. In certain cases aborting a classifier assessment will not be allowed for example in case of KFoldCrossValidation. In this case the given ProgressUpdater should be ignored.

Usage:
  • pU.setMax()= number of iterations of the assessment-loop
  • iteration=0;
  • assessment-loop
    • pU.setValue()=iteration+1;
    • Sample treatment
    • train();
    • test();
    • iteration++;
  • repeat unless(ready or not(pU.isCanceled()))
Returns:
true, if no errors occured
Throws:
IllegalArgumentException - if the given AssessParameterSet is of wrong type
Exception - that occured during training or using classifiers/models

prepareAssessment

protected void prepareAssessment(Sample... s)
                          throws WrongAlphabetException
Prepares an assessment. If the given Sample may not be used for this assessment, this method throws an exception.
Further MeanResultSets are initiated for this assessment (one for each contained classifier).

Parameters:
s - the Sample to be checked
Throws:
WrongAlphabetException - if

1. s is null or not of required length (number of classes)
2. AlphabetContainer of s are not consistent with AlphabetContainer of local models or classifiers

test

protected void test(MeasureParameters mp,
                    boolean exception,
                    Sample... testS)
             throws SimpleParameter.IllegalValueException,
                    MeanResultSet.InconsistentResultNumberException,
                    MeanResultSet.AdditionImpossibleException,
                    Exception
Uses the given test-Samples to call the evaluate-methods of the local AbstractClassifiers. The returned NumericalResults as well as the numerical characteristics are added to each classifiers MeanResultSet.
It should not be necessary to override this method in subclasses.

Parameters:
mp - determines which performance-measures are used to assess the classifiers
exception - whether an Exception should be thrown if some MeasureParameters.Measure could not be evaluated
testS - samples used as test-sets (has to contain one Sample for each class)
Throws:
IllegalArgumentException - if the length of testS is not equal to the dimension of the classification-problem (testS.length!=this.myAbstractClassifier[0].getNumberOfClasses())
SimpleParameter.IllegalValueException
MeanResultSet.InconsistentResultNumberException
MeanResultSet.AdditionImpossibleException
Exception
See Also:
AbstractClassifier.evaluate(MeasureParameters, boolean, Sample...)

train

protected void train(Sample... trainS)
              throws IllegalArgumentException,
                     Exception
Trains the local classifiers using the given trainingSamples.
The classifiers are either directly trained or via training of the local models. The second option always is used, if the ClassifierAssessment-object was constructed using AbstractModels.

It should not be necessary to override this method in subclasses.

Parameters:
trainS - samples used as train-sets (has to contain one Sample for each class)
Throws:
IllegalArgumentException - if the length of trainS is not equal to the dimension of the classification-problem (trainS.length!=this.myAbstractClassifier[0].getNumberOfClasses())
Exception