ReClaM Documentation

Reference for the Regression and Classification Models Toolbox.

Overview

ReClaM offers a rich set of methods for supervised learning, reaching from basic to state of the art tools. For example, the library provides simple linear models as well as different flavors of neural network and support vector machine models and training algorithms. Sophisticated model selection tools complete the collection.

This introduction is organized as follows:

The ReClaM Architecture

NOTE: In the transition from Shark version 1.4.x to 1.5.x there was a major change of the ReClaM architecture. For details on design goals and technical considerations refer to the following documents:

To achieve most flexibility, the ReClaM package is build like a construction kit. The main components are:

These components are modeled by the three ReClaM base classes

ReClaMBaseClasses.png

The organization of the ReClaM base classes with an illustration of the information flow.

A Typical ReClaM Application

To give a motivation and a context for the following introduction, we will very briefly outline the general structure of a typical supervised learning scenario, that is, data driven model adaptation, within the ReClaM architecture:
An instance of a problem specific Model subclass, an ErrorFunction and a suitable Optimizer object are constructed. The training data, that is, input patterns with labels, are loaded into the system. Model and optimizer are initialized. The optimization takes the form of a loop calling the optimizer. In each iteration the optimizer modifies the model parameters in order to achieve a smaller error function value than before. The error computation usually involves the model prediction on the data and the labels. This proceeding is illustrated by the following code sniplet:

 // prepare data
 Array<double> trainingData;
 Array<double> trainingTargets;
 LoadOrGenerateData(trainingData, trainingTargets);

 // prepare basic objects
 MyModel model;            // sub class of Model
 MyErrorFunction error;    // sub class of ErrorFunction
 MyOptimizer optimizer;    // sub class of Optimizer

 // optimize / learn
 optimizer.init(model);
 for (int i=0; i<ITERATIONS; i++)
 {
     optimizer.optimize(model, error, trainingData, trainingTargets);
 }

The base class interfaces provide all functionality that is needed for this general scheme. Therefore, the core of all ReClaM programs looks quite similar. Now, most of the ReClaM classes are specialized subclasses implementing models, error functions or optimizers. This document will first introduce the base class interfaces and then turn to the most prominent model types.

The Model Base Class

The Model class provides an interface for an adaptable data processing model. A typical model can be thought of as a parameterized family of functions or maps. The interface provides the following functionality:

A model needs input patterns to perform its computations. See the Model documentation for a complete list of models.

The ErrorFunction Base Class

The ErrorFunction class is a quite simple interface providing only the following functions:

An error function needs a model, input patterns and label information to perform its computations. See the ErrorFunction documentation for a complete list of error functions.

The Optimizer Base Class

The Optimizer class unifies standard initialization and iterative optimization steps of all ReClaM optimizers. It provides the following interface:

An optimizer needs a model, an error function as well as input patterns with labels to perform its computations. These computations result in a data driven adaptation of the model parameters in order to reduce the error evaluated on the training data. See the Optimizer documentation for a complete list of optimizers.

Linear Methods

ReClaM offers five simple linear models for learning in vector spaces:

The following standard learning methods operating on these models are available as optimizers:

These optimizers deviate from the standard ReClaM philosophy because they solve the underlying problems analytically, that is, without the need for an iterative update loop. In most cases, these algorithms can serve for preprocessing or as base line methods.

Neural Networks

Predefined Network Models

ReClaM comes with several predefined network types. These networks cover a wide variety of applications. Whenever these prototypes prove inappropriate they can hopefully serve as base classes or starting points for refined implementations. The following table lists several properties:

Name Type Activation Functions Training Error Measure
Hidden Neurons Output Neurons
FFNet Feed Forward
sigm1.gif
sigm1.gif
none
MSEFFNet Feed Forward
sigm1.gif
sigm1.gif
Mean Squared Error
LinOutFFNet Feed Forward
sigm1.gif
lin.gif
none
LinOutMSEBFFNet Feed Forward
sigm1.gif
lin.gif
Mean Squared Error
TanhNet Feed Forward
tanh.gif
tanh.gif
Squared Error
LinearOutputTanhNet Feed Forward
tanh.gif
lin.gif
Squared Error
ProbenNet Feed Forward
fabs.gif
lin.gif
Mean Squared Error
ProbenBNet Feed Forward
fabs.gif
lin.gif
Mean Squared Error
MSERNNet Recurrent
sigm1.gif
sigm1.gif
Mean Squared Error
RBFNet Radial Basis Function special special none
MSERBFNet Radial Basis Function special special Mean Squared Error

Connection matrix creation

The methods in the file createConnectionMatrix.h automatically create connection matrices for feed-forward and recurrent networks with several layers.
Flags are offered that can be set to establish standard connections between the neurons of the network.

Monitoring the training

The EarlyStopping class provides measures that can be used to fight overfitting during the training process.

Active Learning

By means of the VarianceEstimator class it is possible to enhance the learning speed of your network by allowing it to add a new pattern to the training set at each step. This pattern is chosen in a way that it will minimize the network error.

Kernel Methods

ReClaM offers a variety of kernel based learning algorithms. In the following the basic concepts and the corresponding classes are introduced.

svm.png

Kernel Functions

Kernel based learning algorithms rely on a positive definite kernel function taking two input patterns as arguments. The kernel function can be interpreted as first mapping the inputs into a (usually high dimensional) Hilbert space and then computing the inner product in this space. A kernel based algorithm is a learning algorithm that can be formulated in terms of inner products of pairs of input patterns. These inner products are then computed using a kernel function.

The KernelFunction interface encapsulates this functionality. Usually a parameterized family of kernel functions is considered. Because we are interested in the optimization of its parameters, KernelFunction is derived from Model. A kernel does not provide an output given an input. Therefore it defines a new interface:

Of course the KernelFunction inherits the Model members for retrieving and setting parameters and checking the feasibility of the parameters.

Several standard kernel functions are available, together with combinations of kernels to new kernels:

Class Parameters Formula Description
LinearKernel none $ \langle x, z \rangle = x^T z $ The linear kernel is the canonical inner product.
PolynomialKernel degree d and offset v $ (\langle x, z \rangle + v)^d $ The polynomial kernel is a polynomial of the linear kernel. This representation corresponds to a feature space of monomials of the input coordinates.
RBFKernel concentration $ \gamma $ $ \exp(-\gamma \|x-z\|^2) $ The Gaussian Radial Basis Function kernel is one of the most widely used kernels for vector valued data. It is known to be universal.
DiagGaussKernel diagonal inverse covariance matrix M $ \exp(-(x-z)^T M (x-z)) $ Gaussian kernel with adaptible covariance matrix. The extent of the Gaussian along each coordinate axis can be controlled.
GeneralGaussKernel inverse covariance matrix M $ \exp(-(x-z)^T M (x-z)) $ Gaussian kernel with adaptible covariance matrix. The complete shape of the Gaussian can be controlled.
NormalizedKernel parameters of a base kernel k $ \frac{k(x, z)}{\sqrt{k(x, x) \cdot k(z, z)}} $ The normalization of a positive definite kernel is again a valid positive definite kernel. The normalization ensures that the norm of each example in feature space is 1.
WeightedSumKernel non-negative weights wi and parameters of base kernels ki $ \sum_i w_i k_i(x, z) $ The non-negative linear combination of positive definite kernels is a valid positive definite kernel again.

Simple Algorithms

The most simple kernel based algorithms available are

Their standard vector space versions can be obtained by simply using the linear kernel. These algorithms can serve as a baseline for the evaluation of more advanced methods. A special aspect of these algorithms is that they do not have any hyperparameters and do not need any special optimization. That is, once the training data are provided the models can be used immediately.

Support Vector Machines

Support Vector Machines (SVMs) are the most prominent kernel based learning algorithms. The basic ideas of support vector machines are as follows:

SVM Model

Support vector machines usually result in a sparse affine sum of the form

\[ f(x) = \sum_{i \in S} \alpha_i k(x_i, x) + b \]

where the support vector index set S is a subset of the training data indices. In the case of classification with more than two classes the expansion becomes

\[ f(x) = \sum_{i \in S} \sum_c \alpha_{i,c} k(x_i, x) y_c + b \]

where c sums over the classes and $ y_i $ are class prototype vectors.

In ReClaM, a support vector machine is a Model holding the coefficients $ \alpha_i $ and $ b $. Its prediction can be restricted to the sign of the computed value, which is the standard form for binary classification problems. For multi-class problems, the output vector can analogously be transformed into an integer class label. In contrast to other ReClaM models, support vector machines output only one value per input pattern. They can be viewed as affine linear functions mapping from the span of the support vectors in feature space to the real line (or to $R^n$).

SVM Training

Training an SVM involves several so called hyperparameters. These are encapsulated by the MetaSVM class and its descendants:

Class Parameters Problem Type Description
C_SVM regularization parameter C binary classification C-SVM formulation using either 1-norm or 2-norm slack penalty. This is the most usual way to train a support vector machine.
Epsilon_SVM regularization parameter C, accuracy parameter $ \varepsilon $ regression Standard SVM for regression
RegularizationNetwork regularization parameter $ \gamma $ regression regularized linear regression in feature space
GaussianProcess none regression Bayesian inference is used to estimate the regularization and kernel parameters.
OneClassSVM regularization parameter $ \nu $ support / density quantile estimation Estimation of an area of high density
AllInOneMcSVM regularization parameter C multi-category classification Covers the methods by Weston and Watkins and by Wahba (without bias)
CrammerSingerMcSVM regularization parameter $ \beta $ multi-category classification Multi-Class SVM by Crammer and Singer (without bias)
OVAMcSVM regularization parameter C multi-category classification A simple one-versus-all machine (without bias)
BinaryCostMcSVM regularization parameter C multi-category classification Multi-class classification at the training cost of a binary machine (without bias)

For the adaptation of the hyperparameters of the C_SVM or the Epsilon_SVM, please refer to the model selection section below.

SVMs are trained solving a special quadratic program. For this purpose ReClaM comes with a various efficient solvers for the particular types of optimization problems. The quadratic programming is hidden from the user of the library. An SVM is trained using the special SVM_Optimizer class. This class constructs a QuadraticProgram object, solves the corresponding problem and copies the solution into the SVM parameter vector. The optimization involves a special quadratic objective function. For efficiency reasons it is not desirable to define this objective function as an ErrorFunction derived class and use a standard purpose gradient based optimizer. Because of this special situation the SVM_Optimizer can be called without an error function parameter. When called with an error function through the standard Optimizer interface the error function is ignored.

Model Selection

The process of model selection is the choice of a model from a family of candidate models. This may correspond to the selection of the hyperparameters of a MetaSVM subclass. In this case the model family is parameterized by the MetaSVM parameter vector.

General Model Selection

A model selection problem is usually solved by an outer optimization loop investigating a model family and an inner optimization loop finding the best model within this family. The fitness of a model family is simply the fitness of its best element.

In general, model selection can be carried out on complicated spaces like the possible topologies of a neural network. These problems require specialized evolutionary algorithms not provided by the ReClaM library. However, in the case of support vector machines the model selection problem can usually be reduced to real valued parameter adaptation, to which ReClaM is perfectly suited. Therefore most model selection support is available for this model type. Only the cross validation method is applicable to general model selection problems.

Model Selection for Support Vector Machines

The model of a support vector machine is made up by the kernel function and the complexity control. The kernel parameters and the SVM training parameters are to be chosen. These parameters are captured by the MetaSVM class and its subclasses.

Several objective functions have been proposed for model selection for support vector machines. Besides the usual cross validation measures ReClaM provides the following ErrorFunction subclasses:

Example Programs

To give you a better idea, how using the several components of ReClaM to build your own networks, you can take a look at some commented source codes of example programs for different types of predefined networks. To run these programs please go to the examples directory of ReClaM.

Please have a look at our introductory tutorials on using multi-layer perceptron networks, part 1 and part 2.

Models, Error Functions, and Optimizers

Models

The following table lists the most prominent ReClaM model types:

Name Inputs Outputs Description
LinearMap $ R^n $ $ R^m $ Linear mapping $ x \mapsto y = A x $. An affine version is available, too.
LinearFunction $ R^n $ $ R $ Special case of a linear map with one output dimension. An affine version is available, too.
ComponentWiseModel $ X^k $ $ Y^k $ Apply a base model $ f : X \rightarrow Y $ $k$-fold and component wise, making up a model $ F : X^k \rightarrow Y^k $.
ConcatenatedModel $ X $ $ Y $ Apply a chain $ f_1 \circ f_2 \circ \dots \circ f_k $ of models. The output space of model $ f_i $ and the input space of model $ f_{i+1} $ must match.
FFNet $ R^n $ $ R^m $ feed forward neural network
RBFNet $ R^n $ $ R $ radial basis function network
MSERNNet $ R^n $ $ R^m $ recurrent neural network with built-in mean squared error computation
KernelFunction $ X \times X $ $ R $ Please refer to the Kernel Methods documentation.
KernelMeanClassifier $ X $ $ R $ Simple mean classifier in a kernel defined feature space
KernelNearestNeighbor $ X $ $ \{-1, +1\} $ Simple k-nearest-neighbor classifier in a kernel defined feature space
SVM $ X $ $ \{-1, +1\} $ or $ R $ Support Vector Machine model, that is, affine linear function in a kernel defined feature space.
MultiClassSVM $ X $ $ \{0, \dots, \#classes-1\} $ or $ R^n $ Support Vector Machine model, that is, affine linear function in a kernel defined feature space.
SvmApproximationModel Used internally. Please refer to the SvmApproximation class for documentation.
MetaSVM --- --- Base class of all SVM training schemes. See the MetaSVM documentation or the section Support Vector Machines for details.
SigmoidModel R (0, 1) sigmoidal function $ f(x) = \frac{1}{1 + \exp(ax+b)} $





Error Functions

This table gives an overview over the error functions available:

Class Differentiable Restrictions Description
SquaredError yes none squared error between model output and target
MeanSquaredError / DF_MeanSquaredError yes none mean squared error between model output and target
CrossEntropy / DF_CrossEntropy yes unit interval model outputs cross entropy measure for neural network classifier training
CrossEntropyIndependent / DF_CrossEntropyIndependent yes unit interval model outputs cross entropy measure for neural network classifier training with non exclusive attributes
ClassificationError no Classifiers fraction of misclassified patterns
BalancedClassificationError no Classifiers mean fraction of misclassified patterns per class
ErrorPercentage no none error percentage based on the mean squared error
RadiusMargin yes KernelFunction derived classes and C_SVM with 2-norm slack penalty squared radius margin quotient for SVM model selection
NegativeKTA yes KernelFunction derived classes and C_SVM with 2-norm slack penalty negative kernel target alignment
NegativeBKTA yes KernelFunction derived classes and C_SVM with 2-norm slack penalty negative balanced kernel target alignment
NegativePolarization yes KernelFunction derived classes and C_SVM with 2-norm slack penalty negative kernel polarization measure
SpanBound no SVM classifier span bound for a support vector machine classifier model selection
LOO no SVM classifier leave one out error especially tuned towards support vector machines
InverseClassSeparability yes KernelFunction derived classes inverse of the class separability measure termed J, by Huilin Xiong and M. N. S. Swamy





Optimizers

Among others, here come the most used optimizers:

Class Parameters Gradient-Based? Description
IRpropPlus initial step sizes, increase and decrease factors, minimal and maximal step sizes yes Improved resilent backpropagation with weight backtracking. Other variants without improvement or backtracking are available.
CG line search type and several tuning parameters with default values yes Conjugate Gradient method
BFGS line search type and several tuning parameters with default values yes Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm
CMAOptimizer CMA type, initial standard deviations no Covariance Matrix Adaptation Evolution Strategy. This is a wrapper class for the EALib CMA implementation. The CMA is the first choice optimizer if the derivative is not available or lots of undesirable local minima are a concern.
GridSearch grid definition no Grid search is the most basic optimization method available. Several variants add flexibility and/or improve efficiency.

Further features

ReClaM offers a bunch of other models, error functions, and optimizers, as well as tools, e.g. for data handling, unified exception handling, and even more. Check out the class or file lists for a full overview.

For example, have a look at the tools defined in

Maybe the fastest way to get a grip on ReClaM is to have a look at the examples.