Dataset Class Reference

The Dataset class encapsulates a realization of data from a DataSource. More...

#include <Dataset.h>

List of all members.

Public Member Functions

 Dataset (const Dataset &dataset)
 Construction of a Dataset from another Dataset.
 Dataset (DataSource &source, int train, int test)
 Construction of a Dataset from a generic DataSource.
 Dataset (const char *filename, int train, int test=0)
 Construction of a Dataset from a single file.
 Dataset (const char *filename, double train)
 Construction of a Dataset from a single file.
 Dataset (const char *trainfile, const char *testfile)
 Construction of a Dataset from a pair of files.
 Dataset (const char *trainfile, const char *testfile, int train)
 Construction of a Dataset from a pair of files, but using a different data separation into training and test set.
 Dataset (const char *datafile, const char *splitfile, double disambiguation)
 Construction of a Dataset from a data file and a split file.
 Dataset (const Array< double > &trainingData, const Array< double > &trainingTarget, const Array< double > &testData, const Array< double > &testTarget)
 Construction of a Dataset object from given arrays.
void ShuffleTraining ()
 shuffles the training examples
void ShuffleTest ()
 shuffles the test examples
void ShuffleAll ()
 shuffles the union of training and test examples, such that the number of training and test examples remains unchanged
const Array< double > & getTrainingData () const
 access to the training data as a constant array
const Array< double > & getTrainingTarget () const
 access to the training targets as a constant array
const Array< double > & getTestData () const
 access to the test data as a constant array
const Array< double > & getTestTarget () const
 access to the test targets as a constant array
bool Save (const char *filename, bool training=true, bool test=true, const char *format="ascii")
 Save the current Dataset to a file.
bool SaveLIBSVM (const char *filename, bool training=true, bool test=true)
 Save the current Dataset in LIBSVM format.
void NormalizeComponents ()
 component wise normalization of the dataset

Protected Member Functions

bool ReadSplitFile (const char *filename, std::vector< unsigned int > &train, std::vector< unsigned int > &test)
bool ReadLine (FILE *file, char *buffer, int bufferlength)

Protected Attributes

Array< double > trainingData
Array< double > trainingTarget
Array< double > testData
Array< double > testTarget


Detailed Description

The Dataset class encapsulates a realization of data from a DataSource.

A Dataset consists of separate training and test data. It may be necessary to split the training data into sub blocks, e.g. for cross validation. However, the test data are assumed to be completely unknown during the whole training process.
Examples:

CrossValidation.cpp, KernelOptimization.cpp, KM.cpp, KNN.cpp, LinearClassifierTest.cpp, LinearRegressionTest.cpp, McSvm.cpp, SvmApproximationExample.cpp, SVMclassification-gnuplot.cpp, and SVMclassification.cpp.

Definition at line 203 of file Dataset.h.


Constructor & Destructor Documentation

Dataset::Dataset ( const Dataset dataset  ) 

Construction of a Dataset from another Dataset.

Definition at line 418 of file Dataset.cpp.

References getTestData(), getTestTarget(), getTrainingData(), getTrainingTarget(), testData, testTarget, trainingData, and trainingTarget.

Dataset::Dataset ( DataSource source,
int  train,
int  test 
)

Construction of a Dataset from a generic DataSource.

Definition at line 426 of file Dataset.cpp.

References DataSource::GetData(), testData, testTarget, trainingData, and trainingTarget.

Dataset::Dataset ( const char *  filename,
int  train,
int  test = 0 
)

Construction of a Dataset from a single file.

Definition at line 434 of file Dataset.cpp.

References DataFile::GetData(), DataFile::getNumberOfExamples(), testData, testTarget, trainingData, and trainingTarget.

Dataset::Dataset ( const char *  filename,
double  train 
)

Construction of a Dataset from a single file.

Definition at line 446 of file Dataset.cpp.

References DataFile::GetData(), DataFile::getNumberOfExamples(), testData, testTarget, trainingData, and trainingTarget.

Dataset::Dataset ( const char *  trainfile,
const char *  testfile 
)

Construction of a Dataset from a pair of files.

Definition at line 457 of file Dataset.cpp.

References DataFile::GetData(), DataFile::getNumberOfExamples(), testData, testTarget, trainingData, and trainingTarget.

Dataset::Dataset ( const char *  trainfile,
const char *  testfile,
int  train 
)

Construction of a Dataset from a pair of files, but using a different data separation into training and test set.

Definition at line 467 of file Dataset.cpp.

References DataFile::GetData(), DataFile::getNumberOfExamples(), i, testData, testTarget, trainingData, and trainingTarget.

Dataset::Dataset ( const char *  datafile,
const char *  splitfile,
double  disambiguation 
)

Dataset::Dataset ( const Array< double > &  trainingData,
const Array< double > &  trainingTarget,
const Array< double > &  testData,
const Array< double > &  testTarget 
)

Construction of a Dataset object from given arrays.

Definition at line 575 of file Dataset.cpp.


Member Function Documentation

const Array<double>& Dataset::getTestData (  )  const [inline]

access to the test data as a constant array

Examples:
CrossValidation.cpp, KM.cpp, KNN.cpp, SvmApproximationExample.cpp, SVMclassification-gnuplot.cpp, and SVMclassification.cpp.

Definition at line 257 of file Dataset.h.

Referenced by Dataset().

const Array<double>& Dataset::getTestTarget (  )  const [inline]

access to the test targets as a constant array

Examples:
CrossValidation.cpp, KM.cpp, KNN.cpp, SvmApproximationExample.cpp, SVMclassification-gnuplot.cpp, and SVMclassification.cpp.

Definition at line 263 of file Dataset.h.

Referenced by Dataset().

const Array<double>& Dataset::getTrainingData (  )  const [inline]

access to the training data as a constant array

Examples:
CrossValidation.cpp, KernelOptimization.cpp, KM.cpp, KNN.cpp, SvmApproximationExample.cpp, SVMclassification-gnuplot.cpp, and SVMclassification.cpp.

Definition at line 245 of file Dataset.h.

Referenced by Dataset().

const Array<double>& Dataset::getTrainingTarget (  )  const [inline]

access to the training targets as a constant array

Examples:
CrossValidation.cpp, KernelOptimization.cpp, KM.cpp, KNN.cpp, SvmApproximationExample.cpp, SVMclassification-gnuplot.cpp, and SVMclassification.cpp.

Definition at line 251 of file Dataset.h.

Referenced by Dataset().

void Dataset::NormalizeComponents (  ) 

component wise normalization of the dataset

Normalize each component of the dataset by an affine linear transformation such that afterwards the training set has zero mean and unit variance in every component.

Definition at line 853 of file Dataset.cpp.

References i, testData, and trainingData.

bool Dataset::ReadLine ( FILE *  file,
char *  buffer,
int  bufferlength 
) [protected]

Definition at line 887 of file Dataset.cpp.

Referenced by ReadSplitFile().

bool Dataset::ReadSplitFile ( const char *  filename,
std::vector< unsigned int > &  train,
std::vector< unsigned int > &  test 
) [protected]

Definition at line 903 of file Dataset.cpp.

References i, and ReadLine().

Referenced by Dataset().

bool Dataset::Save ( const char *  filename,
bool  training = true,
bool  test = true,
const char *  format = "ascii" 
)

Save the current Dataset to a file.

Parameters:
filename name of the file, must not exist
training include the training data?
test include the test data?
format see description
Returns:
The method returns true on success and false in case of failure.
The data can be saved in one of the following formats:
  • ascii: text file with one ascii encoded number per input and output
  • sparse: text file with sparse encoding, usefull for datasets containing many zeros as input
  • float: binary file with one IEEE float number per input and output
  • double: binary file with one IEEE double number per input and output
  • int8: binary file with a signed 8-bit-integer per input and output
  • int16: binary file with a signed 16-bit-integer per input and output
  • int32: binary file with a signed 32-bit-integer per input and output
  • uint8: binary file with an unsigned 8-bit-integer per input and output
  • uint16: binary file with an unsigned 16-bit-integer per input and output
  • uint32: binary file with an unsigned 32-bit-integer per input and output

Definition at line 665 of file Dataset.cpp.

References Dataset_WriteType, i, testData, testTarget, trainingData, and trainingTarget.

bool Dataset::SaveLIBSVM ( const char *  filename,
bool  training = true,
bool  test = true 
)

Save the current Dataset in LIBSVM format.

Parameters:
filename name of the file, must not exist
training include the training data?
test include the test data?
Returns:
The method returns true on success and false in case of failure.

Definition at line 803 of file Dataset.cpp.

References i, testData, testTarget, trainingData, and trainingTarget.

void Dataset::ShuffleAll (  ) 

shuffles the union of training and test examples, such that the number of training and test examples remains unchanged

Definition at line 624 of file Dataset.cpp.

References i, testData, testTarget, trainingData, and trainingTarget.

void Dataset::ShuffleTest (  ) 

shuffles the test examples

Definition at line 604 of file Dataset.cpp.

References i, testData, and testTarget.

void Dataset::ShuffleTraining (  ) 

shuffles the training examples

Definition at line 584 of file Dataset.cpp.

References i, trainingData, and trainingTarget.


Member Data Documentation

Array<double> Dataset::testData [protected]

Definition at line 320 of file Dataset.h.

Referenced by Dataset(), NormalizeComponents(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTest().

Array<double> Dataset::testTarget [protected]

Definition at line 321 of file Dataset.h.

Referenced by Dataset(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTest().

Array<double> Dataset::trainingData [protected]

Definition at line 318 of file Dataset.h.

Referenced by Dataset(), NormalizeComponents(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTraining().

Array<double> Dataset::trainingTarget [protected]

Definition at line 319 of file Dataset.h.

Referenced by Dataset(), Save(), SaveLIBSVM(), ShuffleAll(), and ShuffleTraining().


The documentation for this class was generated from the following files: