de.jstacs.classifier
Class AbstractScoreBasedClassifier

java.lang.Object
  extended by de.jstacs.classifier.AbstractClassifier
      extended by de.jstacs.classifier.AbstractScoreBasedClassifier
All Implemented Interfaces:
Storable, Cloneable
Direct Known Subclasses:
MappingClassifier, ModelBasedClassifier, ScoreClassifier

public abstract class AbstractScoreBasedClassifier
extends AbstractClassifier

This class is the main class for all score based classifiers. Score based classifiers enable you to compute different many measures easily. For instance one can use the package "ROCR" in R to compute or plot many of them.

Author:
Jens Keilwagen, Jan Grau, Andre Gohr

Nested Class Summary
static class AbstractScoreBasedClassifier.DoubleTableResult
          This class is for a table of doubles.
 
Constructor Summary
AbstractScoreBasedClassifier(AlphabetContainer abc, int classes)
          The constructor for a homogeneous classifier.
AbstractScoreBasedClassifier(AlphabetContainer abc, int classes, double classWeight)
          The constructor for a homogeneous classifier.
AbstractScoreBasedClassifier(AlphabetContainer abc, int length, int classes)
          The constructor for an inhomogeneous classifier.
AbstractScoreBasedClassifier(AlphabetContainer abc, int length, int classes, double classWeight)
          The constructor for an inhomogeneous classifier.
AbstractScoreBasedClassifier(StringBuffer xml)
          The constructor for the Storable interface.
 
Method Summary
protected  void check(Sample s)
          This method checks if the given sample can be used.
protected  void check(Sequence seq)
          This method checks if the given sequence can be used.
 byte classify(Sequence seq)
          This method classifies a sequence and returns the index i, with 0 < i < getNumberOfClasses(), of the class to which the sequence is assigned.
protected  byte classify(Sequence seq, boolean check)
          This method classifies a sequence.
 AbstractScoreBasedClassifier clone()
           
protected  void createDefaultClassWeights(int classes, double val)
          This method creates new class weights.
protected  void extractFurtherClassifierInfosFromXML(StringBuffer xml)
          Extracts further information of a classifier from an xml-representation.
protected  double getClassWeight(int index)
          Returns the class weight for class index.
 double[] getClassWeights()
          Retuns the specific class weights of a AbstractScoreBasedClassifier
protected  StringBuffer getFurtherClassifierInfos()
          This method returns further information of a classifier as a StringBuffer.
 int getNumberOfClasses()
          Returns the number of classes that can be distinguished.
 double[] getPValue(Sample candidates, Sample bg)
          Returns the p-values for all sequence in candidates with respect to a given background sample.
 double getPValue(Sequence candidate, Sample bg)
          Returns the p-value for a sequence candidate with respect to a given background sample.
protected  LinkedList<? extends Result> getResults(Sample[] s, MeasureParameters params, boolean exceptionIfNotComputeable, boolean all)
          This method computes the results for any evaluation of the classifier.
 double getScore(Sequence seq, int i)
          This method returns the score for a given sequence and a given class.
protected abstract  double getScore(Sequence seq, int i, boolean check)
          This method returns the score for a given sequence and a given class.
 double[] getScores(Sample s)
          This method returns the scores of the classifier for any sequence in the sample.
 void setClassWeights(boolean add, double... weights)
          Sets new class weights.
 void setThresholdClassWeights(boolean add, double t)
          Sets a new threshold for 2-class-classifiers.
 ConfusionMatrix test(Sample... testData)
          This method computes the confusion matrix for a given array of test data
 
Methods inherited from class de.jstacs.classifier.AbstractClassifier
classify, evaluate, evaluateAll, getAlphabetContainer, getCharacteristics, getClassificationRate, getClassifierAnnotation, getInstanceName, getLength, getMeasuresForEvaluate, getMeasuresForEvaluateAll, getNumericalCharacteristics, getXMLTag, isTrained, setNewAlphabetContainerInstance, toXML, train, train
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int classes)
The constructor for a homogeneous classifier. Such a classifier can handle sequences of arbitrary length.The class weights are set initially to 0

Parameters:
abc - the alphabets that are used
classes - the number of different classes

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int classes,
                                    double classWeight)
The constructor for a homogeneous classifier. Such a classifier can handle sequences of arbitrary length. The class weights are set initially to classWeight.

Parameters:
abc - the alphabets that are used
classes - the number of different classes
classWeight - the value of all class weights

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int length,
                                    int classes)
The constructor for an inhomogeneous classifier. Such a classifier can handle sequences of fixed length. The class weights are set initially to 0.

Parameters:
abc - the alphabets that are used
length - the length of the sequences that can be classified
classes - the number of different classes
Throws:
IllegalArgumentException - if the length and the possible length of the AlphabetContainer does not match

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(AlphabetContainer abc,
                                    int length,
                                    int classes,
                                    double classWeight)
The constructor for an inhomogeneous classifier. Such a classifier can handle sequences of fixed length. The class weights are set initially to classWeight.

Parameters:
abc - the alphabets that are used
length - the length of the sequences that can be classified
classes - the number of different classes
classWeight - the value of all class weights
Throws:
IllegalArgumentException - if the length and the possible length of the AlphabetContainer does not match

AbstractScoreBasedClassifier

public AbstractScoreBasedClassifier(StringBuffer xml)
                             throws NonParsableException
The constructor for the Storable interface.

Parameters:
xml - the StringBuffer
Throws:
NonParsableException - if the StringBuffer is not parsable
Method Detail

clone

public AbstractScoreBasedClassifier clone()
                                   throws CloneNotSupportedException
Overrides:
clone in class AbstractClassifier
Throws:
CloneNotSupportedException

classify

public byte classify(Sequence seq)
              throws Exception
Description copied from class: AbstractClassifier
This method classifies a sequence and returns the index i, with 0 < i < getNumberOfClasses(), of the class to which the sequence is assigned.

This method should check that the sequence is defined over the underlying alphabet and length.

Specified by:
classify in class AbstractClassifier
Parameters:
seq - the sequence to be classified
Returns:
the index of the class to which the sequence is assigned
Throws:
Exception - if the classifier is not trained or something is wrong with the sequence

getResults

protected LinkedList<? extends Result> getResults(Sample[] s,
                                                  MeasureParameters params,
                                                  boolean exceptionIfNotComputeable,
                                                  boolean all)
                                           throws Exception
Description copied from class: AbstractClassifier
This method computes the results for any evaluation of the classifier.

Overrides:
getResults in class AbstractClassifier
Parameters:
s - the array of Samples
params - the current parameters
exceptionIfNotComputeable - if true the method throws an exception if a measure could not be computed, otherwise it is ignored
all - if true the method computes all results, if false it computes only the numerical results
Returns:
a list of results
Throws:
Exception - if something went wrong
See Also:
AbstractClassifier.evaluate(MeasureParameters, boolean, Sample...), AbstractClassifier.evaluateAll(MeasureParameters, boolean, Sample...)

getClassWeights

public double[] getClassWeights()
Retuns the specific class weights of a AbstractScoreBasedClassifier

Returns:
the classWeights

getNumberOfClasses

public int getNumberOfClasses()
Description copied from class: AbstractClassifier
Returns the number of classes that can be distinguished. For example if distinguish between foreground and background this method should return 2, even if you use a mixture model for either foreground or background.

Specified by:
getNumberOfClasses in class AbstractClassifier
Returns:
the number of classes that can be distinguished

getScore

public double getScore(Sequence seq,
                       int i)
                throws Exception
This method returns the score for a given sequence and a given class.

Parameters:
seq - the sequence
i - the index of the class
Returns:
the score
Throws:
Exception

setClassWeights

public final void setClassWeights(boolean add,
                                  double... weights)
                           throws ClassDimensionException
Sets new class weights.
The logarithmic probabilities of an item i given class 0 to class n are computed to classify this item into a class. The class weights are added to each of these logarithmic probabilities. As higher (relational) the class weight of class j, as more probable it becomes, that any item is classfied into this class.
Class weights do not have to be logarithmic probabilities. If sum_j=0_to_n exp(classWeight_j)=1: class weights may be interpreted as logarithmic class-a-priori-probabilities.

Parameters:
add - if true the class weights are added to the current class weights
weights - the array of weights, for each class the weight that is added in a classification
Throws:
ClassDimensionException

setThresholdClassWeights

public final void setThresholdClassWeights(boolean add,
                                           double t)
                                    throws OperationNotSupportedException
Sets a new threshold for 2-class-classifiers.
Only available if this AbstractScoreBasedClassifier distinguishes between 2 classes 0 and 1. In this case, t will be interpreted as log(P(class1)/P(class(0))). A large t (greated than 0) makes the classifier to decide more often for class 1. A small t (smaller than 0) makes the classifier to decide more often for class 0.

Parameters:
add - if true the class weights are added to the current class weights
t - the new threshold
Throws:
OperationNotSupportedException - if the classifier is no 2-class-classifier

test

public ConfusionMatrix test(Sample... testData)
                     throws Exception
Description copied from class: AbstractClassifier
This method computes the confusion matrix for a given array of test data

Overrides:
test in class AbstractClassifier
Parameters:
testData - the data
Returns:
the confusion matrix
Throws:
ClassDimensionException - if the number of samples in incorrect
Exception - if something went wrong

getFurtherClassifierInfos

protected StringBuffer getFurtherClassifierInfos()
Description copied from class: AbstractClassifier
This method returns further information of a classifier as a StringBuffer. This method is used by the method toXML() and should not be made public.

Specified by:
getFurtherClassifierInfos in class AbstractClassifier
Returns:
further information of a classifier as a StringBuffer

check

protected void check(Sample s)
              throws NotTrainedException,
                     IllegalArgumentException
This method checks if the given sample can be used.

Parameters:
s - the sample to be checked
Throws:
NotTrainedException - if the classifier is not trained
IllegalArgumentException - if something is wrong with s
See Also:
AbstractClassifier.setNewAlphabetContainerInstance(AlphabetContainer)

check

protected void check(Sequence seq)
              throws NotTrainedException,
                     IllegalArgumentException
This method checks if the given sequence can be used.

Parameters:
seq - the sequence to be checked
Throws:
NotTrainedException - if the classifier is not trained
IllegalArgumentException - if something is wrong with seq

classify

protected byte classify(Sequence seq,
                        boolean check)
                 throws Exception
This method classifies a sequence. It enables you to check the constraints (alphabets, length, isTrained()).

Parameters:
seq - the sequence
check - if true the constraints will be checked
Returns:
the index of the class the sequence is assigned to
Throws:
Exception - if something went wrong
See Also:
check(Sequence)

createDefaultClassWeights

protected void createDefaultClassWeights(int classes,
                                         double val)
                                  throws IllegalArgumentException
This method creates new class weights. Each class weight has the same value val, so the class weights do not have any influence on the classification.

Parameters:
classes - the number of different classes
val - the value that is used for all classes
Throws:
IllegalArgumentException - if the number of classes is below 2

extractFurtherClassifierInfosFromXML

protected void extractFurtherClassifierInfosFromXML(StringBuffer xml)
                                             throws NonParsableException
Description copied from class: AbstractClassifier
Extracts further information of a classifier from an xml-representation. This method is used by the method fromXML( StringBuffer ) and should not be made public.

Specified by:
extractFurtherClassifierInfosFromXML in class AbstractClassifier
Parameters:
xml - the xml-representation
Throws:
NonParsableException

getClassWeight

protected double getClassWeight(int index)
Returns the class weight for class index.

Parameters:
index - the index of the class
Returns:
the class weight

getScore

protected abstract double getScore(Sequence seq,
                                   int i,
                                   boolean check)
                            throws IllegalArgumentException,
                                   NotTrainedException,
                                   Exception
This method returns the score for a given sequence and a given class.

Parameters:
seq - the sequence
i - the index of the class
check - the switch to decide whether to check AlphabetContainer and length of the sequence or not
Returns:
the score
Throws:
NotTrainedException
IllegalArgumentException
Exception

getScores

public double[] getScores(Sample s)
                   throws Exception
This method returns the scores of the classifier for any sequence in the sample. The scores are stored in the array according to the index of the sequence in the sample.

Only for 2-class-classifiers.

Parameters:
s - the sample
Returns:
the array of scores
Throws:
Exception - if something went wrong

getPValue

public double getPValue(Sequence candidate,
                        Sample bg)
                 throws Exception
Returns the p-value for a sequence candidate with respect to a given background sample.

The p-value is the percentage of background sequences that have a score that is at least as high as the score for the candidate.

It is not recommended to use this method in a for loop. In such cases one should use the method that works on two samples.

Parameters:
candidate - the candidate sequence
bg - the background sample
Returns:
the p-value for the candidate
Throws:
Exception - if something went wrong
See Also:
getPValue(Sample, Sample)

getPValue

public double[] getPValue(Sample candidates,
                          Sample bg)
                   throws Exception
Returns the p-values for all sequence in candidates with respect to a given background sample.

The p-value is the percentage of background sequences that have a score that is at least as high as the score a candidate.

Parameters:
candidates - the candidate sequences
bg - the background sample
Returns:
the p-values for all sequences in candidates
Throws:
Exception - if something went wrong