de.jstacs.classifiers.assessment
Class KFoldCrossValidation

java.lang.Object
  extended by de.jstacs.classifiers.assessment.ClassifierAssessment
      extended by de.jstacs.classifiers.assessment.KFoldCrossValidation

public class KFoldCrossValidation
extends ClassifierAssessment

This class implements a k-fold crossvalidation. A k-fold crossvalidation assesses classifiers using the following methodology. The user supplies datasets (one for each class the classifiers are capable to distinguish). The data is randomly, mutually exclusive partitioned into k parts. Each of those parts is used once as a test dataset while the remaining k-1 parts are used as train dataset. In each of the k iterations, the train datasets are used to train the classifier and the test datasets are used to assess the classifier's performance to predict the elements therein. Additional the user has to define which assessment measures should be used.

Author:
Andre Gohr (bioinf (nospam:.) ag (nospam:@) googlemail (nospam:.) com), Jens Keilwagen

Field Summary
 
Fields inherited from class de.jstacs.classifiers.assessment.ClassifierAssessment
myAbstractClassifier, myModel, myTempMeanResultSets, skipLastClassifiersDuringClassifierTraining
 
Constructor Summary
  KFoldCrossValidation(AbstractClassifier... aCs)
          Creates a new KFoldCrossValidation from a set of AbstractClassifiers.
  KFoldCrossValidation(AbstractClassifier[] aCs, boolean buildClassifiersByCrossProduct, TrainableStatisticalModel[]... aMs)
          This constructor allows to assess a collection of given AbstractClassifiers and those constructed using the given TrainableStatisticalModels by a KFoldCrossValidation .
protected KFoldCrossValidation(AbstractClassifier[] aCs, TrainableStatisticalModel[][] aMs, boolean buildClassifiersByCrossProduct, boolean checkAlphabetConsistencyAndLength)
          Creates a new KFoldCrossValidation from an array of AbstractClassifiers and a two-dimensional array of TrainableStatisticalModel s, which are combined to additional classifiers.
  KFoldCrossValidation(boolean buildClassifiersByCrossProduct, TrainableStatisticalModel[]... aMs)
          Creates a new KFoldCrossValidation from a set of TrainableStatisticalModels.
 
Method Summary
 ListResult assessWithPredefinedSplits(NumericalPerformanceMeasureParameterSet mp, ClassifierAssessmentAssessParameterSet caaps, ProgressUpdater pU, DataSet[]... splitData)
          This method implements a k-fold crossvalidation on previously split data.
protected  void evaluateClassifier(NumericalPerformanceMeasureParameterSet mp, ClassifierAssessmentAssessParameterSet assessPS, DataSet[] s, ProgressUpdater pU)
          Evaluates a classifier.
 
Methods inherited from class de.jstacs.classifiers.assessment.ClassifierAssessment
assess, assess, assess, getClassifier, getNameOfAssessment, prepareAssessment, test, train
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KFoldCrossValidation

protected KFoldCrossValidation(AbstractClassifier[] aCs,
                               TrainableStatisticalModel[][] aMs,
                               boolean buildClassifiersByCrossProduct,
                               boolean checkAlphabetConsistencyAndLength)
                        throws IllegalArgumentException,
                               WrongAlphabetException,
                               CloneNotSupportedException,
                               ClassDimensionException
Creates a new KFoldCrossValidation from an array of AbstractClassifiers and a two-dimensional array of TrainableStatisticalModel s, which are combined to additional classifiers. If buildClassifiersByCrossProduct is true, the cross-product of all TrainableStatisticalModels in aMs is built to obtain these classifiers.

Parameters:
aCs - the predefined classifiers
aMs - the TrainableStatisticalModels that are used to build additional classifiers
buildClassifiersByCrossProduct - Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],..., aMs[k][i] for a fixed i is constructed. In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
checkAlphabetConsistencyAndLength - indicates if alphabets and lengths shall be checked for consistency
Throws:
IllegalArgumentException - if the classifiers have different lengths
WrongAlphabetException - if the classifiers use different alphabets
CloneNotSupportedException - if something went wrong while cloning
ClassDimensionException - if there is something wrong with the class dimension of the classifier
See Also:
ClassifierAssessment.ClassifierAssessment(AbstractClassifier[], TrainableStatisticalModel[][], boolean, boolean)

KFoldCrossValidation

public KFoldCrossValidation(AbstractClassifier... aCs)
                     throws IllegalArgumentException,
                            WrongAlphabetException,
                            CloneNotSupportedException,
                            ClassDimensionException
Creates a new KFoldCrossValidation from a set of AbstractClassifiers.

Parameters:
aCs - contains the classifiers to be assessed.
If model based classifiers are trained, the order of models in classifiers determines, which model will be trained using which sample in method assess( ... ).
For a two-class problem, it is recommended
  • to initiate the classifiers with models in order (foreground model (positive class), background model (negative class))
  • to initiate an assessment object using models in order (foreground model (positive class), background model (negative class))
  • to give data s in order (s[0] contains foreground data, s[1] contains background data)
Throws:
IllegalArgumentException - if the classifiers have different lengths
WrongAlphabetException - if not all given classifiers are defined on the same AlphabetContainer
CloneNotSupportedException - if something went wrong while cloning
ClassDimensionException - if there is something wrong with the class dimension of the classifier
See Also:
ClassifierAssessment.ClassifierAssessment(AbstractClassifier...)

KFoldCrossValidation

public KFoldCrossValidation(boolean buildClassifiersByCrossProduct,
                            TrainableStatisticalModel[]... aMs)
                     throws IllegalArgumentException,
                            WrongAlphabetException,
                            CloneNotSupportedException,
                            ClassDimensionException
Creates a new KFoldCrossValidation from a set of TrainableStatisticalModels. The argument buildClassifiersByCrossProduct determines how these TrainableStatisticalModels are combined to classifiers.

Parameters:
buildClassifiersByCrossProduct -
Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],..., aMs[k][i] for a fixed i is constructed. In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
aMs -
Contains the models in the following way (suppose a k-class problem): the first dimension encodes the class (here it is k), the second dimension (aMs[i]) contains the models according to class i.
If models are trained directly (during assessment), the order of given models during initiation of this assessment object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two-class problem, it is recommended
  • to initiate the classifiers with models in order (foreground model (positive class), background model (negative class))
  • to initiate an assessment object using models in order (foreground model (positive class), background model (negative class))
  • to give data s in order (s[0] contains foreground data, s[1] contains background data)
Throws:
IllegalArgumentException - if the classifiers have different lengths
WrongAlphabetException - if not all given classifiers are defined on the same AlphabetContainer
CloneNotSupportedException - if something went wrong while cloning
ClassDimensionException - if there is something wrong with the class dimension of the classifier
See Also:
ClassifierAssessment.ClassifierAssessment(boolean, TrainableStatisticalModel[][])

KFoldCrossValidation

public KFoldCrossValidation(AbstractClassifier[] aCs,
                            boolean buildClassifiersByCrossProduct,
                            TrainableStatisticalModel[]... aMs)
                     throws IllegalArgumentException,
                            WrongAlphabetException,
                            CloneNotSupportedException,
                            ClassDimensionException
This constructor allows to assess a collection of given AbstractClassifiers and those constructed using the given TrainableStatisticalModels by a KFoldCrossValidation .

Parameters:
aCs - contains some AbstractClassifiers that should be assessed in addition to the AbstractClassifier constructed using the given TrainableStatisticalModels
buildClassifiersByCrossProduct -
Determines how classifiers are constructed using the given models. Suppose a k-class problem. In this case, each classifier is supposed to consist of k models, one responsible for each class.
Let S_i be the set of all models in aMs[i]. Let S be the set S_1 x S_2 x ... x S_k (cross-product).

true: all possible classifiers consisting of a subset (set of k models) of S are constructed
false: one classifier consisting of the models aMs[0][i],aMs[1][i],..., aMs[k][i] for a fixed i is constructed. In this case, all second dimensions of aMs have to be equal, say m. In total m classifiers are constructed.
aMs -
Contains the models in the following way (suppose a k-class problem): the first dimension encodes the class (here it is k), the second dimension (aMs[i]) contains the models according to class i.
If models are trained directly (during assessment), the order of given models during initiation of this assessment object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two-class problem, it is recommended
  • to initiate the classifiers with models in order (foreground model (positive class), background model (negative class))
  • to initiate an assessment object using models in order (foreground model (positive class), background model (negative class))
  • to give data s in order (s[0] contains foreground data, s[1] contains background data)
Throws:
IllegalArgumentException - if the classifiers have different lengths
WrongAlphabetException - if not all given classifiers are defined on the same AlphabetContainer
CloneNotSupportedException - if something went wrong while cloning
ClassDimensionException - if there is something wrong with the class dimension of the classifier
See Also:
ClassifierAssessment.ClassifierAssessment(AbstractClassifier[], boolean, TrainableStatisticalModel[][])
Method Detail

evaluateClassifier

protected void evaluateClassifier(NumericalPerformanceMeasureParameterSet mp,
                                  ClassifierAssessmentAssessParameterSet assessPS,
                                  DataSet[] s,
                                  ProgressUpdater pU)
                           throws IllegalArgumentException,
                                  Exception
Evaluates a classifier.

Specified by:
evaluateClassifier in class ClassifierAssessment
Parameters:
mp - defines which performance measures are used to assess classifiers
pU - the progress updater which shows the progress of the k-fold crossvalidation
s - contains the data to be used for assessment. The order of samples is important.
If model based classifiers are trained, the order of the models in the classifiers determines, which model will be trained using which sample. The first model in the classifier will be trained using the first sample in s. If the models are trained directly, the order of the given models during initiation of this assessment object determines, which sample will be used for training which model. In general the first model will be trained using the first sample in s... .
For a two-class problem, it is recommended
  • to initiate the classifiers with models in order (foreground model (positive class), background model (negative class))
  • to initiate an assessment object using models in order (foreground model (positive class), background model (negative class))
  • to give data s in order (s[0] contains foreground data, s[1] contains background data)
assessPS - contains parameters for a run of this KFoldCrossValidation. Must be of type KFoldCrossValidationAssessParameterSet.
Throws:
IllegalArgumentException - if the given assessPS is not of type KFoldCrossValidationAssessParameterSet
Exception - if something went wrong
See Also:
ClassifierAssessment.evaluateClassifier(NumericalPerformanceMeasureParameterSet, ClassifierAssessmentAssessParameterSet, DataSet[], ProgressUpdater)

assessWithPredefinedSplits

public ListResult assessWithPredefinedSplits(NumericalPerformanceMeasureParameterSet mp,
                                             ClassifierAssessmentAssessParameterSet caaps,
                                             ProgressUpdater pU,
                                             DataSet[]... splitData)
                                      throws Exception
This method implements a k-fold crossvalidation on previously split data. This might be useful if you like to compare the results of your classifier(s) with those of a previous study in a paper or manuscript.

Parameters:
mp - defines which performance measures are used to assess classifiers
caaps - contains the defined element length and choice whether an exception should be thrown if a measure could not be computed
pU - the progress updater which shows the progress of the k-fold crossvalidation
splitData - the previously split data; splitData[i] contains the splits for class i; therefore the length of each subarray splitData[i] has to to be identical
Returns:
the result packed in a ListResult
Throws:
Exception - if something went wrong