de.jstacs.classifier.assessment
Class RepeatedHoldOutExperiment

java.lang.Object
  extended by de.jstacs.classifier.assessment.ClassifierAssessment
      extended by de.jstacs.classifier.assessment.RepeatedHoldOutExperiment

public class RepeatedHoldOutExperiment
extends ClassifierAssessment

This class implements a repeated holdout experiment for assessing classifiers. The methodology used by a repeated holdout experiment is as follows. The user supplies a data-set for each class the classifiers are capable to predict. In each step the given data-sets are randomly, mutually exclusive partitioned into a test- and a train-data-set of user specified size. Afterwards the train-data-sets are used to train the classifiers and the test-data-sets are used to assess the performance of the classifiers to predict the elements therein using user specified assessment-measures. Additional the user defines how often this procedure is repeated.

Author:
andr|e gohr (a0coder (nospam:@) gmail (nospam:.) com)

Field Summary
 
Fields inherited from class de.jstacs.classifier.assessment.ClassifierAssessment
myAbstractClassifier, myBuildClassifierByCrossProduct, myModel, myTempMeanResultSets, skipLastClassifiersDuringClassifierTraining
 
Constructor Summary
  RepeatedHoldOutExperiment(AbstractClassifier... aCs)
          Creates a new RepeatedHoldOutExperiment from a set of AbstractClassifiers.
  RepeatedHoldOutExperiment(AbstractClassifier[] aCs, boolean buildClassifiersByCrossProduct, Model[]... aMs)
          This constructor allows to assess a collection of given AbstractClassifiers and those constructed using the given AbstractModels by a RepeatedHoldOutExperiment.
protected RepeatedHoldOutExperiment(AbstractClassifier[] aCs, Model[][] aMs, boolean buildClassifiersByCrossProduct, boolean checkAlphabetConsistencyAndLength)
          Creates a new RepeatedHoldOutExperiment from an array of AbstractClassifiers and a two-dimensional array of Models, which are combined to additional classifiers.
  RepeatedHoldOutExperiment(boolean buildClassifiersByCrossProduct, Model[]... aMs)
          Creates a new RepeatedHoldOutExperiment from a set of Models.
 
Method Summary
protected  boolean evaluateClassifier(MeasureParameters mp, ClassifierAssessmentAssessParameterSet assessPS, Sample[] s, ProgressUpdater pU)
          This method must be implemented in all subclasses.
 
Methods inherited from class de.jstacs.classifier.assessment.ClassifierAssessment
assess, assess, assess, getClassifier, getNameOfAssessment, prepareAssessment, test, train
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RepeatedHoldOutExperiment

protected RepeatedHoldOutExperiment(AbstractClassifier[] aCs,
                                    Model[][] aMs,
                                    boolean buildClassifiersByCrossProduct,
                                    boolean checkAlphabetConsistencyAndLength)
                             throws IllegalArgumentException,
                                    WrongAlphabetException,
                                    CloneNotSupportedException,
                                    ClassDimensionException
Creates a new RepeatedHoldOutExperiment from an array of AbstractClassifiers and a two-dimensional array of Models, which are combined to additional classifiers. If buildClassifiersByCrossProduct is true, the cross product of all Models in aMs is built to obtain these classifiers.

Parameters:
aCs - the pre-defined classifiers
aMs - the Models that are used to build additional classifiers
buildClassifiersByCrossProduct - Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],...,aMs[k][i] for a fixed i is constructed . In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
checkAlphabetConsistencyAndLength - indicates if alphabets and lengths shall be checked for consistency
Throws:
IllegalArgumentException
WrongAlphabetException
CloneNotSupportedException
ClassDimensionException

RepeatedHoldOutExperiment

public RepeatedHoldOutExperiment(AbstractClassifier... aCs)
                          throws IllegalArgumentException,
                                 WrongAlphabetException,
                                 CloneNotSupportedException,
                                 ClassDimensionException
Creates a new RepeatedHoldOutExperiment from a set of AbstractClassifiers.

Parameters:
aCs - contains the classifiers to be assessed
If model-based classifiers are trained, the order of models in classifiers determines, which model will be trained using which sample in method assess().
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
Throws:
IllegalArgumentException
WrongAlphabetException - if not all given classifiers are defined on the same AlphabetContainer
ClassDimensionException
CloneNotSupportedException

RepeatedHoldOutExperiment

public RepeatedHoldOutExperiment(boolean buildClassifiersByCrossProduct,
                                 Model[]... aMs)
                          throws IllegalArgumentException,
                                 WrongAlphabetException,
                                 CloneNotSupportedException,
                                 ClassDimensionException
Creates a new RepeatedHoldOutExperiment from a set of Models. The argument buildClassifiersByCrossProduct determines how these Models are combined to classifiers.

Parameters:
buildClassifiersByCrossProduct -
Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],...,aMs[k][i] for a fixed i is constructed . In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
aMs -
Contains the models in the following way (suppose a k-class problem): the first dimension encodes the class (here it is k), the second dimension (aMs[i]) contains the models according to class i.
If models are trained directly (during assessment), the order of given models during initiation of this assessment-object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
Throws:
WrongAlphabetException - if not all given models are defines on the same AlphabetContainer
IllegalArgumentException
CloneNotSupportedException
ClassDimensionException

RepeatedHoldOutExperiment

public RepeatedHoldOutExperiment(AbstractClassifier[] aCs,
                                 boolean buildClassifiersByCrossProduct,
                                 Model[]... aMs)
                          throws IllegalArgumentException,
                                 WrongAlphabetException,
                                 CloneNotSupportedException,
                                 ClassDimensionException
This constructor allows to assess a collection of given AbstractClassifiers and those constructed using the given AbstractModels by a RepeatedHoldOutExperiment.

Parameters:
aCs - contains some AbstractClassifier that should be assessed in addition to the AbstractClassifiers constructed using the given AbstractModels
buildClassifiersByCrossProduct -
Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],...,aMs[k][i] for a fixed i is constructed . In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
aMs -
Contains the models in the following way (suppose a k-class problem): the first dimension encodes the class (here it is k), the second dimension (aMs[i]) contains the models according to class i.
If models are trained directly (during assessment), the order of given models during initiation of this assessment-object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two class-problem, it is recommended to
  • initiate the classifiers with models in order (foreground-model (positive class), background-model (negative-class))
  • to initiate a assessment-object using models in order (foreground-model (positive class), background-model (negative-class))
  • to give data s in order (s[0] contains foreground-data, s[1] contains background data)
Throws:
WrongAlphabetException - if not all given models are defines on the same AlphabetContainer
IllegalArgumentException
CloneNotSupportedException
ClassDimensionException
Method Detail

evaluateClassifier

protected boolean evaluateClassifier(MeasureParameters mp,
                                     ClassifierAssessmentAssessParameterSet assessPS,
                                     Sample[] s,
                                     ProgressUpdater pU)
                              throws IllegalArgumentException,
                                     Exception
Description copied from class: ClassifierAssessment
This method must be implemented in all subclasses. It should perform the following tasks:
1.) create test- and train-datasets 2.) call method train to train classifiers/models using train-data 3.) call method test to cause evaluation (test) of trained classieres

Specified by:
evaluateClassifier in class ClassifierAssessment
Parameters:
mp - defines which performance-measures are used to assess classifiers
assessPS - containes assessment-specific parameters (like: number of iterations of a k-fold-crossvalidation)
s - data to be used for assessment (both: test- and train-data)
pU - a ProgressUpdater that mainly has to be used to allow the user to cancel a current running alssifier assessment. This ProgressUpdater is guaranteed to be not null. In certain cases aborting a classifier assessment will not be allowed for example in case of KFoldCrossValidation. In this case the given ProgressUpdater should be ignored.

Usage:
  • pU.setMax()= number of iterations of the assessment-loop
  • iteration=0;
  • assessment-loop
    • pU.setValue()=iteration+1;
    • Sample treatment
    • train();
    • test();
    • iteration++;
  • repeat unless(ready or not(pU.isCanceled()))
Returns:
true, if no errors occured
Throws:
IllegalArgumentException - if the given AssessParameterSet is of wrong type
Exception - that occured during training or using classifiers/models