DataFile Class Reference

The DataFile class is a DataSource based upon a file. More...

#include <Dataset.h>

Inheritance diagram for DataFile:
DataSource

List of all members.

Public Member Functions

 DataFile (const char *filename)
 Constructor.
 ~DataFile ()
 Destructor.
bool GetData (Array< double > &data, Array< double > &target, int count)
 Deliver data from the file.
bool GetData (Array< double > &training_data, Array< double > &training_target, int training, Array< double > &test_data, Array< double > &test_target, int test, bool shuffle=false)
 Deliver data from the file.
int getNumberOfExamples ()
 Return the number of examples available.

Protected Member Functions

bool ReadHeaderLine ()
bool ReadExample (Array< double > &data, Array< double > &target, int number)
int ReadToken (char *buffer, int maxlength, const char *separators)
int DiscardUntil (const char *separators)

Protected Attributes

FILE * file
 open file descriptor
int numberOfExamples
 number of examples
int format
 0=ascii, 1=sparse, 2=float, 3=double
int currentExample
 number of examples already deliviered

Detailed Description

The DataFile class is a DataSource based upon a file.

ReClaM defines a simple file format for real valued datasets. The first line, starting with a doublecross ('#'), serves as a file header. It contains exactly 4 tokens separated by whitespace with the following meaning:
  1. number of examples
  2. input dimension
  3. output/target dimension
  4. data format: one of the keywords "ascii", "sparse", "float", "double", "int8", "int16", "int32", "uint8", "uint16" or "uint32"
Generally, the data are organized sample by sample, and for each sample, the inputs are followed by the outputs.
In ascii and sparse format, every line defines one example where a single space character serves as a separator. All other formats are binary, that is, the numbers are organized continuously without separators. For integers and unsigned integers little endian encoding and for floating point numbers IEEE float or double format is assumed.
In sparse format, all numbers are assumed to be zero. Exceptions are indicated by pairs of the format "index:value", interpreted as "data(index) = value". A semicolon is used to separate the data from the targets. The targets are NOT sparse encoded, that is, they are in standard ascii format.

Definition at line 149 of file Dataset.h.


Constructor & Destructor Documentation

DataFile::DataFile ( const char *  filename  ) 

Constructor.

Definition at line 112 of file Dataset.cpp.

References file, and ReadHeaderLine().

DataFile::~DataFile (  ) 

Destructor.

Definition at line 125 of file Dataset.cpp.

References file.


Member Function Documentation

int DataFile::DiscardUntil ( const char *  separators  )  [protected]

Definition at line 395 of file Dataset.cpp.

References file.

Referenced by ReadExample(), and ReadHeaderLine().

bool DataFile::GetData ( Array< double > &  training_data,
Array< double > &  training_target,
int  training,
Array< double > &  test_data,
Array< double > &  test_target,
int  test,
bool  shuffle = false 
)

Deliver data from the file.

Definition at line 152 of file Dataset.cpp.

References currentExample, DataSource::dataDim, GetData(), numberOfExamples, ReadExample(), and DataSource::targetDim.

bool DataFile::GetData ( Array< double > &  data,
Array< double > &  target,
int  count 
) [virtual]

Deliver data from the file.

Implements DataSource.

Definition at line 135 of file Dataset.cpp.

References currentExample, DataSource::dataDim, i, numberOfExamples, ReadExample(), and DataSource::targetDim.

Referenced by Dataset::Dataset(), and GetData().

int DataFile::getNumberOfExamples (  )  [inline]

Return the number of examples available.

Definition at line 168 of file Dataset.h.

References numberOfExamples.

Referenced by Dataset::Dataset().

bool DataFile::ReadExample ( Array< double > &  data,
Array< double > &  target,
int  number 
) [protected]

Definition at line 237 of file Dataset.cpp.

References DataSource::dataDim, DataFile_ReadType, DiscardUntil(), format, i, ReadToken(), and DataSource::targetDim.

Referenced by GetData().

bool DataFile::ReadHeaderLine (  )  [protected]
int DataFile::ReadToken ( char *  buffer,
int  maxlength,
const char *  separators 
) [protected]

Definition at line 356 of file Dataset.cpp.

References file, and i.

Referenced by ReadExample(), and ReadHeaderLine().


Member Data Documentation

int DataFile::currentExample [protected]

number of examples already deliviered

Definition at line 189 of file Dataset.h.

Referenced by GetData(), and ReadHeaderLine().

FILE* DataFile::file [protected]

open file descriptor

Definition at line 180 of file Dataset.h.

Referenced by DataFile(), DiscardUntil(), ReadHeaderLine(), ReadToken(), and ~DataFile().

int DataFile::format [protected]

0=ascii, 1=sparse, 2=float, 3=double

Definition at line 186 of file Dataset.h.

Referenced by ReadExample(), and ReadHeaderLine().

int DataFile::numberOfExamples [protected]

number of examples

Definition at line 183 of file Dataset.h.

Referenced by GetData(), getNumberOfExamples(), and ReadHeaderLine().


The documentation for this class was generated from the following files: