|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.data.Sample
public class Sample
This is the class for any sample of sequences. All sequences in a sample have to have the same AlphabetContainer. The
sequences may have different lengths.
For the internal representation the class Sequence is used, where the external alphabet is converted to integral
numerical values. The class Sample knows about this coding via instances of class AlphabetContainer respectively
Alphabet.
There are different ways to access the elements of a Sample. If one needs random access there's the
method getElementAt( int i ). For fast sequential access it is recommended to use an ElementEnumerator.
Sample is immutable.
AlphabetContainer
,
Sequence
Nested Class Summary | |
---|---|
static class |
Sample.ElementEnumerator
This class can be used to have a fast sequential access to a sample. |
static class |
Sample.PartitionMethod
This enum defines different partition method for a sample. |
static class |
Sample.WeightedSampleFactory
This class enables you to eliminate sequences that occur more than once in one or more samples. |
Constructor Summary | |
---|---|
Sample(AlphabetContainer abc,
StringExtractor se)
Creates a Sample from a StringExctractor using the given AlphabetContainer. |
|
Sample(AlphabetContainer abc,
StringExtractor se,
int subsequenceLength)
Creates a Sample from a StringExctractor using the given AlphabetContainer and all overlapping windows of subsequenceLength . |
|
Sample(AlphabetContainer abc,
StringExtractor se,
String delim)
Creates a Sample from a StringExctractor using the given AlphabetContainer and delimiter. |
|
Sample(AlphabetContainer abc,
StringExtractor se,
String delim,
int subsequenceLength)
Creates a Sample from a StringExctractor using the given AlphabetContainer, the given delimiter and all overlapping windows of subsequenceLength . |
|
Sample(Sample s,
int subsequenceLength)
This constructor enables you to use subsequences of the elements of a sample. |
|
Sample(String annotation,
Sequence... seqs)
This constructor is specially designed for the method Model.emitSample(int, int...) . |
Method Summary | |
---|---|
Sequence[] |
getAllElements()
Returns an array of sequences containing all elements of this Sample. |
AlphabetContainer |
getAlphabetContainer()
Returns the AlphabetContainer of this Sample. |
String |
getAnnotation()
This method returns some annotation of the sample. |
static String |
getAnnotation(Sample... s)
Returns the annotation for an array of Samples |
Sample |
getCompositeSample(int[] starts,
int[] lengths)
This method enables you to use only an composite sequences of all elements in the current sample. |
Sequence |
getElementAt(int i)
This method returns the element with index i . |
int |
getElementLength()
Returns the length of the elements in this Sample. |
Sample |
getInfixSample(int start,
int length)
This method enables you to use only an infix of all elements in the current sample. |
int |
getMaximalElementLength()
Returns the maximal length of an element in this Sample. |
int |
getMinimalElementLength()
Returns the minimal length of an element in this Sample. |
int |
getNumberOfElements()
Returns the number of elements in this Sample. |
int |
getNumberOfElementsWithLength(int len)
Returns the number of overlapping elements that can be extracted. |
Sample |
getSuffixSample(int start)
This method enables you to use only an suffix of all elements in the current sample. |
static Sample |
intersection(Sample... samples)
This method computes the intersection between all elements of the array, i.e. |
boolean |
isDiscreteSample()
This method returns true all positions use discrete values. |
boolean |
isSimpleSample()
This method answers the question whether all random variable are defined over the same range, i.e. all positions use the same (fixed) alphabet. |
Sample[] |
partition(double p,
Sample.PartitionMethod method,
int subsequenceLength)
This method partitions the elements of the sample in 2 distinct parts. |
Sample[] |
partition(int k,
Sample.PartitionMethod method)
This method partitions the elements of the sample in k distinct parts. |
Sample[] |
partition(Sample.PartitionMethod method,
double... percentage)
This method partitions the elements of the sample in distinct parts. |
void |
save(String msg,
File f)
This method writes a message msg and the sample to a file f |
Sample |
subSampling(int number)
Randomly samples elements (sequences) from the set of all elements (sequences) contained in this Sample . |
String |
toString()
|
static Sample |
union(Sample... s)
Unites all samples in s |
static Sample |
union(Sample[] s,
boolean[] in)
This method unites all Sample from s regarding in . |
static Sample |
union(Sample[] s,
boolean[] in,
int subsequenceLength)
This method unites all Sample from s regarding in and sets the element length in
the united sample to subsequenceLength . |
static Sample |
union(Sample[] s,
int subsequenceLength)
This method unites all Sample from s and sets the element length in
the united sample to subsequenceLength . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public Sample(AlphabetContainer abc, StringExtractor se) throws WrongAlphabetException, EmptySampleException, WrongLengthException
abc
- the AlphabetContainerse
- the StringExtractor
WrongAlphabetException
- if the AlphabetContainer is not suitable
EmptySampleException
- if the Sample would be empty
WrongLengthException
- never happenspublic Sample(AlphabetContainer abc, StringExtractor se, int subsequenceLength) throws WrongAlphabetException, WrongLengthException, EmptySampleException
subsequenceLength
.
abc
- the AlphabetContainerse
- the StringExtractorsubsequenceLength
- the length of the window, sliding of the String of se
. If len
is 0
(zero) than the sequences are used as given from the StringExtractor.
WrongAlphabetException
- if the AlphabetContainer is not suitable
WrongLengthException
- if the subsequence length is not supported
EmptySampleException
- if the Sample would be emptypublic Sample(AlphabetContainer abc, StringExtractor se, String delim) throws WrongAlphabetException, EmptySampleException, WrongLengthException
abc
- the AlphabetContainerse
- the StringExtractordelim
- the delimiter for parsing the Strings
WrongAlphabetException
- if the AlphabetContainer is not suitable
EmptySampleException
- if the Sample would be empty
WrongLengthException
- never happenspublic Sample(AlphabetContainer abc, StringExtractor se, String delim, int subsequenceLength) throws EmptySampleException, WrongAlphabetException, WrongLengthException
subsequenceLength
.
abc
- the AlphabetContainerse
- the StringExtractordelim
- the delimiter for parsing the StringssubsequenceLength
- the length of the window, sliding of the String of se
. If len
is 0
(zero) than the sequences are used as given from the StringExtractor.
WrongAlphabetException
- if the AlphabetContainer is not suitable
EmptySampleException
- if the Sample would be empty
WrongLengthException
- if the subsequence length is not supportedpublic Sample(Sample s, int subsequenceLength) throws WrongLengthException
getElementAt( int i )
are
real objects and do not have to be created at the invocation of the method. (The same holds for the
ElementEnumerator. In those cases both ways to access the sequence are approximately equally fast.)
s
- the samplesubsequenceLength
- the new element length
WrongLengthException
- if something is wrong with subsequenceLengthpublic Sample(String annotation, Sequence... seqs) throws EmptySampleException, IllegalArgumentException
Model.emitSample(int, int...)
.
annotation
- the annotation of the sampleseqs
- the sequence(s)
EmptySampleException
- if the array seqs
is null
or the length is 0
IllegalArgumentException
- if the alphabets do not matchMethod Detail |
---|
public static final String getAnnotation(Sample... s)
s
- an array of Samples
public static final Sample intersection(Sample... samples) throws IllegalArgumentException, EmptySampleException
samples
- the array
IllegalArgumentException
- if the elements of the array are from different domains
EmptySampleException
- if the intersection is emptypublic static final Sample union(Sample[] s, boolean[] in) throws IllegalArgumentException, EmptySampleException
s
regarding in
.
s
- the Samplesin
- an array indicating which sample is used in the union, if in[i]==true
the sample
s[i]
is used.
IllegalArgumentException
- if s.length != in.length or the alphabets do not match
EmptySampleException
- if the union is emptypublic static final Sample union(Sample... s) throws IllegalArgumentException
s
s
- the samples
IllegalArgumentException
- if the alphabets do not matchunion(Sample[], boolean[])
public static final Sample union(Sample[] s, boolean[] in, int subsequenceLength) throws IllegalArgumentException, EmptySampleException, WrongLengthException
s
regarding in
and sets the element length in
the united sample to subsequenceLength
.
s
- the Samplesin
- an array indicating which sample is used in the union, if in[i]==true
the sample
s[i]
is used.subsequenceLength
- the length of the elements in the united sample
IllegalArgumentException
- if s.length != in.length or the alphabets do not match
EmptySampleException
- if the union is empty
WrongLengthException
- if the united sample does not support this subsequenceLengthpublic static final Sample union(Sample[] s, int subsequenceLength) throws IllegalArgumentException, WrongLengthException
s
and sets the element length in
the united sample to subsequenceLength
.
s
- the SamplessubsequenceLength
- the length of the elements in the united sample
IllegalArgumentException
- if the alphabets do not match
WrongLengthException
- if the united sample does not support this subsequenceLengthunion(Sample[], boolean[], int)
public Sequence[] getAllElements()
public final AlphabetContainer getAlphabetContainer()
public final String getAnnotation()
public final Sample getCompositeSample(int[] starts, int[] lengths) throws IllegalArgumentException
starts
- the start positionslengths
- the lengths of the chunks
IllegalArgumentException
- if either start or length or both in combination are not suitablepublic Sequence getElementAt(int i)
i
. See also this comment.
i
- the index
public int getElementLength()
public final Sample getInfixSample(int start, int length) throws IllegalArgumentException
start
- the start position of the infixlength
- the length of the infix, has to be positive
IllegalArgumentException
- if either start or length or both in combination are not suitablepublic int getMinimalElementLength()
public int getMaximalElementLength()
public int getNumberOfElements()
public int getNumberOfElementsWithLength(int len) throws WrongLengthException
len
- the length of the elements
WrongLengthException
- if the given length is bigger than the minimal element lengthpublic final Sample getSuffixSample(int start) throws IllegalArgumentException
start
- the start position of the suffix
IllegalArgumentException
- if either start is not suitablepublic final boolean isSimpleSample()
true
if the sample is simplepublic final boolean isDiscreteSample()
true
all positions use discrete values.
true
if the sample is discretepublic Sample[] partition(double p, Sample.PartitionMethod method, int subsequenceLength) throws WrongLengthException, UnsupportedOperationException, EmptySampleException
2
distinct parts. The second part (test
sample) holds the percentage of p
, the first the rest (train sample). The first part has element
length as the current sample, the second has element length subsequenceLength
.
p
- the percentage for the second part, the second part holds at least this percentage of the full samplemethod
- the method how to partition the sample (partitioning criterion)subsequenceLength
- the element length of the second part. If len
is 0 (zero) than the sequences are used
as given in this Sample.
WrongLengthException
- if some is wrong with subsequenceLength
UnsupportedOperationException
- if the sample is not simple
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample[] partition(Sample.PartitionMethod method, double... percentage) throws IllegalArgumentException, EmptySampleException
method
- the method how to partition the sample (partitioning criterion)percentage
- the array of percentage for each "subsample"
IllegalArgumentException
- if something with the percentages is not correct (sum != 1 or one value not in [0,1])
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample[] partition(int k, Sample.PartitionMethod method) throws IllegalArgumentException, EmptySampleException
k
distinct parts.
k
- the number of partsmethod
- how to split the data
IllegalArgumentException
- if k
is not correct
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample subSampling(int number) throws EmptySampleException
Sample
.
number
- of Sequences that should be drawn from the contained set of sequences (with replacement)
EmptySampleException
- if number is not positivepublic final void save(String msg, File f) throws IOException
msg
and the sample to a file f
- Parameters:
msg
- the message, any informationf
- the File
- Throws:
IOException
- if something went wrong with the file
public String toString()
toString
in class Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |