FAQ - Frequently Asked Questions
The following list of frequently asked questions may prove helpful if you experience any problems with the Shark machine learning library. Please consult this FAQ before reporting any problems or bugs.
Question
Is there a mailing list available for users of the Shark library.Answer
Yes, the mailing list is available at https://lists.sourceforge.net/lists/listinfo/shark-project-user.
Question
What are the differences between Shark and other libraries? Why should I use Shark?Answer
Shark is a native C++ library designed for development and real-world application of state-of-the-art machine learning and optimization algorithms. The library has a history of more than 10 years of successful applications. It is actively supported and still growing. We are continuously extending and improving the algorithms in various domains of machine learning and computational intelligence.
Flexibility and speed are the main design criteria (see the question "How fast is the Shark?"). We think that its flexibility and extensibility makes Shark stand out from other libraries.
It is self-contained and offers computational intelligence techniques such as single- and multi-objective evolutionary algorithms and neural networks as well as kernel-based machine learning methods and classical optimization techniques in a coherent framework. This is unique.
Shark is an object-oriented software library and to use it requires knowledge in C++ programming. If a graphical user interface is important for you, you may go for other machine learning software (or feel free to contribute such a front-end for Shark).
Shark implements a lot of powerful algorithms not available in any other machine learning library, of course in particular methods based on the research of the developers.
Some highlights:
- The Shark SVM is the only SVM package implementing the fastest SMO-based learning algorithm for dense large-scale problems (using hybrid maximum gain working set selection).
- Shark provides a variety of model-selection algorithms for SVMs, for example gradient-based optimization of the kernel-target alignment, which is not available in any other library.
- Shark provides a large collection of efficient gradient-based optimization techniques, for example the frequently applied iRprop+, a fast and robust method not available in other machine learning libraries.
- We do not know any software library for single-objective evolutionary algorithms that comes close to the EALib in terms of variety and quality of algorithms for real-valued optimization.
- To our knowledge, the MOO-EALib is the most comprehensive library for evolutionary multi-objective optimization. The efficient implementation of the hypervolume metric (S or Lebesgue measure) and of the powerful MO-CMA-ES are special features.
Question
How fast is the Shark?Answer
Shark is a C++ library, because we aim at high performance. Flexibility and speed are the main design criteria of the library. However, sometimes these objectives are conflicting and in some cases we had to sacrifice speed for flexibility. For example:- Support Vector Machine (SVMs) were added after examining the best implementations we could find on the web. In Shark, the kernel function is an object passed to the learning machine during runtime, while in the competing SVM code the kernel is a fixed function and you have to recompile the code if you want to change it. Thus, for medium sized problems, Shark is a negligible bit slower, because of the object-oriented architecture. However, because Shark is the only library implementing a special strategy (HMG working set selection) for large-scale problems, it is significantly faster than the competing library for large datasets.
- The modular architecture of ReClaM allows you to instantiate neural networks (NNs) with arbitrary topology, different error functions, and various learning algorithms during runtime. This modular architecture results in communication costs. Further, the NNs rely on Shark's convenient Array data structure, which is appropriate for dense structures and dense data, but not optimal for sparse data. Thus, there may be faster network implementations, however, the highly efficient learning algorithms in ReClaM compensate for that. And ReClaM offers to export NNs after training as independent, plain C source code, which is as fast as it can get.
-
We did a simple test to demonstrate the performance of
the SVM quadratic program solver, compared to LIBSVM (which is the
best reference available). We generated a toy dataset as follows:
This poses a very difficult problem for SMO-based solvers and the problem size makes sure that the initial kernel matrix does not fit into memory, such that the HMG working set selection strategy is active in Shark. We ran both LIBSVM 2.85 and Shark 2.1.0 on this problem with the settings#include <ReClaM/ArtificialDistributions.h> int main(int argc, char** argv) { Chessboard chess(3, 4); Dataset ds(chess, 100000, 0); ds.Save("chessboard.shark.data", true, false, "ascii"); ds.SaveLIBSVM("chessboard.libsvm.data", true, false); }- C-SVM with 1-norm slack penalty, C=1,000,000
- Gaussian RBF kernel with parameter gamma=1
- 256 MB of kernel cache (default for the Shark SVM)
- Accuracy epsilon=0.001 (default)
- 31493 seconds (about 08:45) for Shark 2.1.0
- 40902 seconds (about 11:22) for LIBSVM 2.85
- Not only in ReClaM, but also in the EALib and MOO-EAlib performance is an issue. For example, the functions for computing the hypervolume (Lebesgue measure) in the MOO-EALib are pretty fast. Let us consider the benchmark data set ran.40000pts.3d.1 provided by the Walking Fish Group. It contains 40000 non-dominated points in three dimensions. Running the implementation of the algorithm by Overmars and Yap on a MacBook with 2.16 GHz Intel Core 2 Duo and 2GB RAM takes 4.18652 seconds to compute the hypervolume.
Installation
Question
How do I install the Shark library?Answer
Please refer to the section Getting Started.Question
Do I need root/Administrator access to install the Shark library?Answer
No, root/Administrator access is not required. However, if you want to install the library to a central location (like /usr/lib/ on Linux) you will of course need write access to that directory.Question
I get an "internal compiler error". What can I do?Answer
Under Windows using the Microsoft Developer Studio this error appears regularly without any obvious reason. You can do the following:- Be sure to have all service packs installed.
- Delete all temporary files and restart the compilation in batch mode.
Question
Do I need Qt and Qwt? Where can I get these libraries?Answer
No, you do not need these libraries unless you want to compile the graphical examples. The libraries are available from:Question
I get strange warnings when I compile Shark using a certain Microsoft compiler. What should I do?Answer
Don't worry, ignore them.Packages
Question
What happened to the Fuzzy library?Answer
The Fuzzy library for multi-valued logic and fuzzy control, which was available as an add-on package, was removed from the library some years ago. Recently the library was revised and we included an easy version into the current release. However, the Fuzzy module is still in the beta stage.Question
What happened to the [...] library, the [...] add-on package, the function [...]?Answer
- Some libraries and functions were dropped in Shark 2.0.0, either because they were not used by many people, the implemented algorithms were outdated, or the code did not pass the code review.
- In Shark 2.1.0, all libraries were merged into one big Shark library. Noone seemed to install just one component of Shark and the different libraries required a lot of typing in Makefiles etc.
- The add-on packages are no longer maintained together with the main library.
- In Shark 2.1.1 the GUI example package was added to the main library package.