The Dataset class encapsulates a realization of data from a DataSource. More...
#include <Dataset.h>
Public Member Functions | |
| Dataset (const Dataset &dataset) | |
| Construction of a Dataset from another Dataset. | |
| Dataset (DataSource &source, int train, int test) | |
| Construction of a Dataset from a generic DataSource. | |
| Dataset (const char *filename, int train, int test=0) | |
| Construction of a Dataset from a single file. | |
| Dataset (const char *filename, double train) | |
| Construction of a Dataset from a single file. | |
| Dataset (const char *trainfile, const char *testfile) | |
| Construction of a Dataset from a pair of files. | |
| Dataset (const char *trainfile, const char *testfile, int train) | |
| Construction of a Dataset from a pair of files, but using a different data separation into training and test set. | |
| Dataset (const char *datafile, const char *splitfile, double disambiguation) | |
| Construction of a Dataset from a data file and a split file. | |
| Dataset (const Array< double > &trainingData, const Array< double > &trainingTarget, const Array< double > &testData, const Array< double > &testTarget) | |
| Construction of a Dataset object from given arrays. | |
| void | ShuffleTraining () |
| shuffles the training examples | |
| void | ShuffleTest () |
| shuffles the test examples | |
| void | ShuffleAll () |
| shuffles the union of training and test examples, such that the number of training and test examples remains unchanged | |
| const Array< double > & | getTrainingData () const |
| access to the training data as a constant array | |
| const Array< double > & | getTrainingTarget () const |
| access to the training targets as a constant array | |
| const Array< double > & | getTestData () const |
| access to the test data as a constant array | |
| const Array< double > & | getTestTarget () const |
| access to the test targets as a constant array | |
| bool | Save (const char *filename, bool training=true, bool test=true, const char *format="ascii") |
| Save the current Dataset to a file. | |
| bool | SaveLIBSVM (const char *filename, bool training=true, bool test=true) |
| Save the current Dataset in LIBSVM format. | |
| void | NormalizeComponents () |
| component wise normalization of the dataset | |
| void | NormalizeComponent (int d) |
| normalizes a single component | |
Protected Member Functions | |
| bool | ReadSplitFile (const char *filename, std::vector< unsigned int > &train, std::vector< unsigned int > &test) |
| bool | ReadLine (FILE *file, char *buffer, int bufferlength) |
Protected Attributes | |
| Array< double > | trainingData |
| Array< double > | trainingTarget |
| Array< double > | testData |
| Array< double > | testTarget |
The Dataset class encapsulates a realization of data from a DataSource.
Definition at line 203 of file Dataset.h.
| Dataset::Dataset | ( | const Dataset & | dataset | ) |
Construction of a Dataset from another Dataset.
Definition at line 418 of file Dataset.cpp.
References getTestData(), getTestTarget(), getTrainingData(), getTrainingTarget(), testData, testTarget, trainingData, and trainingTarget.
| Dataset::Dataset | ( | DataSource & | source, | |
| int | train, | |||
| int | test | |||
| ) |
Construction of a Dataset from a generic DataSource.
Definition at line 426 of file Dataset.cpp.
References DataSource::GetData(), testData, testTarget, trainingData, and trainingTarget.
| Dataset::Dataset | ( | const char * | filename, | |
| int | train, | |||
| int | test = 0 | |||
| ) |
Construction of a Dataset from a single file.
Definition at line 434 of file Dataset.cpp.
References DataFile::GetData(), DataFile::getNumberOfExamples(), testData, testTarget, trainingData, and trainingTarget.
| Dataset::Dataset | ( | const char * | filename, | |
| double | train | |||
| ) |
Construction of a Dataset from a single file.
Definition at line 446 of file Dataset.cpp.
References DataFile::GetData(), DataFile::getNumberOfExamples(), testData, testTarget, trainingData, and trainingTarget.
| Dataset::Dataset | ( | const char * | trainfile, | |
| const char * | testfile | |||
| ) |
Construction of a Dataset from a pair of files.
Definition at line 457 of file Dataset.cpp.
References DataFile::GetData(), DataFile::getNumberOfExamples(), testData, testTarget, trainingData, and trainingTarget.
| Dataset::Dataset | ( | const char * | trainfile, | |
| const char * | testfile, | |||
| int | train | |||
| ) |
Construction of a Dataset from a pair of files, but using a different data separation into training and test set.
Definition at line 467 of file Dataset.cpp.
References DataFile::GetData(), DataFile::getNumberOfExamples(), i, testData, testTarget, trainingData, and trainingTarget.
| Dataset::Dataset | ( | const char * | datafile, | |
| const char * | splitfile, | |||
| double | disambiguation | |||
| ) |
Construction of a Dataset from a data file and a split file.
Definition at line 529 of file Dataset.cpp.
References DataFile::GetData(), DataSource::getDataDimension(), DataFile::getNumberOfExamples(), DataSource::getTargetDimension(), i, ReadSplitFile(), testData, testTarget, trainingData, and trainingTarget.
| Dataset::Dataset | ( | const Array< double > & | trainingData, | |
| const Array< double > & | trainingTarget, | |||
| const Array< double > & | testData, | |||
| const Array< double > & | testTarget | |||
| ) |
Construction of a Dataset object from given arrays.
Definition at line 575 of file Dataset.cpp.
| const Array<double>& Dataset::getTestData | ( | ) | const [inline] |
| const Array<double>& Dataset::getTestTarget | ( | ) | const [inline] |
| const Array<double>& Dataset::getTrainingData | ( | ) | const [inline] |
| const Array<double>& Dataset::getTrainingTarget | ( | ) | const [inline] |
| void Dataset::NormalizeComponent | ( | int | d | ) |
normalizes a single component
Definition at line 887 of file Dataset.cpp.
References i, testData, and trainingData.
| void Dataset::NormalizeComponents | ( | ) |
component wise normalization of the dataset
Definition at line 853 of file Dataset.cpp.
References i, testData, and trainingData.
| bool Dataset::ReadLine | ( | FILE * | file, | |
| char * | buffer, | |||
| int | bufferlength | |||
| ) | [protected] |
Definition at line 923 of file Dataset.cpp.
Referenced by ReadSplitFile().
| bool Dataset::ReadSplitFile | ( | const char * | filename, | |
| std::vector< unsigned int > & | train, | |||
| std::vector< unsigned int > & | test | |||
| ) | [protected] |
| bool Dataset::Save | ( | const char * | filename, | |
| bool | training = true, |
|||
| bool | test = true, |
|||
| const char * | format = "ascii" | |||
| ) |
Save the current Dataset to a file.
| filename | name of the file, must not exist | |
| training | include the training data? | |
| test | include the test data? | |
| format | see description |
The data can be saved in one of the following formats:
Definition at line 665 of file Dataset.cpp.
References Dataset_WriteType, i, testData, testTarget, trainingData, and trainingTarget.
| bool Dataset::SaveLIBSVM | ( | const char * | filename, | |
| bool | training = true, |
|||
| bool | test = true | |||
| ) |
Save the current Dataset in LIBSVM format.
| filename | name of the file, must not exist | |
| training | include the training data? | |
| test | include the test data? |
Definition at line 803 of file Dataset.cpp.
References i, testData, testTarget, trainingData, and trainingTarget.
| void Dataset::ShuffleAll | ( | ) |
shuffles the union of training and test examples, such that the number of training and test examples remains unchanged
Definition at line 624 of file Dataset.cpp.
References i, testData, testTarget, trainingData, and trainingTarget.
| void Dataset::ShuffleTest | ( | ) |
shuffles the test examples
Definition at line 604 of file Dataset.cpp.
References i, testData, and testTarget.
| void Dataset::ShuffleTraining | ( | ) |
shuffles the training examples
Definition at line 584 of file Dataset.cpp.
References i, trainingData, and trainingTarget.
Array<double> Dataset::testData [protected] |
Definition at line 330 of file Dataset.h.
Referenced by Dataset(), NormalizeComponent(), NormalizeComponents(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTest().
Array<double> Dataset::testTarget [protected] |
Definition at line 331 of file Dataset.h.
Referenced by Dataset(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTest().
Array<double> Dataset::trainingData [protected] |
Definition at line 328 of file Dataset.h.
Referenced by Dataset(), NormalizeComponent(), NormalizeComponents(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTraining().
Array<double> Dataset::trainingTarget [protected] |
Definition at line 329 of file Dataset.h.
Referenced by Dataset(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTraining().