Introduction
This tutorial gives some simple examples of how to write an mlpack binding that can be compiled for multiple languages. These bindings make up the core of how most users will interact with mlpack.
mlpack provides the following:
- mlpack::Log, for debugging / informational / warning / fatal output
- mlpack::IO, for parsing command line options or other option
Each of those classes are well-documented, and that documentation should be consulted for further reference.
First, we'll discuss the logging infrastructure, which is useful for giving output that users can see.
Simple Logging Example
mlpack has four logging levels:
- Log::Debug
- Log::Info
- Log::Warn
- Log::Fatal
Output to Log::Debug does not show (and has no performance penalty) when mlpack is compiled without debugging symbols. Output to Log::Info is only shown when the program is run with the –verbose
(or -v
) flag. Log::Warn is always shown, and Log::Fatal will throw a std::runtime_error exception, after a newline is sent to it. If mlpack was compiled with debugging symbols, Log::Fatal will also print a backtrace, if the necessary libraries are available.
Here is a simple example binding, and its output. Note that instead of int
main()
, we use static
void
mlpackMain()
. This is because the automatic binding generator (see mlpack automatic bindings to other languages) will set up the environment and once that is done, it will call mlpackMain()
.
#define BINDING_TYPE BINDING_TYPE_CLI
static void mlpackMain()
{
Log::Debug <<
"Compiled with debugging symbols." << std::endl;
Log::Info <<
"Some test informational output." << std::endl;
Log::Fatal <<
"Program has crashed." << std::endl;
}
Assuming mlpack is installed on the system and the code above is saved in test.cpp
, this program can be compiled with the following command:
$ g++ -o test test.cpp -DDEBUG -g -rdynamic -lmlpack
Since we compiled with -DDEBUG
, if we run the program as below, the following output is shown:
$ ./test --verbose
[DEBUG] Compiled with debugging symbols.
[INFO ] Some test informational output.
[WARN ] A warning!
[FATAL] [bt]: (1) /absolute/path/to/file/example.cpp:6: function()
[FATAL] Program has crashed.
terminate called after throwing an instance of 'std::runtime_error'
what(): fatal error; see Log::Fatal output
Aborted
The flags -g
and -rdynamic
are only necessary for providing a backtrace. If those flags are not given during compilation, the following output would be shown:
$ ./test --verbose
[DEBUG] Compiled with debugging symbols.
[INFO ] Some test informational output.
[WARN ] A warning!
[FATAL] Cannot give backtrace because program was compiled without: -g -rdynamic
[FATAL] For a backtrace, recompile with: -g -rdynamic.
[FATAL] Program has crashed.
terminate called after throwing an instance of 'std::runtime_error'
what(): fatal error; see Log::Fatal output
Aborted
The last warning is not reached, because Log::Fatal terminates the program.
Without debugging symbols (i.e. without -g
and -DDEBUG
) and without –verbose, the following is shown:
$ ./test
[WARN ] A warning!
[FATAL] Program has crashed.
terminate called after throwing an instance of 'std::runtime_error'
what(): fatal error; see Log::Fatal output
Aborted
These four outputs can be very useful for both providing informational output and debugging output for your mlpack program.
Simple IO Example
Through the mlpack::IO object, command-line parameters can be easily added with the BINDING_NAME, BINDING_SHORT_DESC, BINDING_LONG_DESC, BINDING_EXAMPLE, BINDING_SEE_ALSO, PARAM_INT, PARAM_DOUBLE, PARAM_STRING, and PARAM_FLAG macros.
Here is a sample use of those macros, extracted from methods/pca/pca_main.cpp. (Some details have been omitted from the snippet below.)
BINDING_NAME("Principal Components Analysis");
"An implementation of several strategies for principal components analysis "
"(PCA), a common preprocessing step. Given a dataset and a desired new "
"dimensionality, this can reduce the dimensionality of the data using the "
"linear transformation determined by PCA.");
"This program performs principal components analysis on the given dataset "
"using the exact, randomized, randomized block Krylov, or QUIC SVD method. "
"It will transform the data onto its principal components, optionally "
"performing dimensionality reduction by ignoring the principal components "
"with the smallest eigenvalues.");
"https://en.wikipedia.org/wiki/Principal_component_analysis");
"@doxygen/classmlpack_1_1pca_1_1PCA.html"));
PARAM_INT_IN(
"new_dimensionality",
"Desired dimensionality of output dataset.",
"d", 0);
static void mlpackMain()
{
arma::mat& dataset = IO::GetParam<arma::mat>("input");
size_t newDimension = IO::GetParam<int>("new_dimensionality");
...
if (IO::HasParam("output"))
IO::GetParam<arma::mat>("output") = std::move(dataset);
}
Documentation is automatically generated using those macros, and when the program is run with –help the following is displayed:
$ mlpack_pca --help
Principal Components Analysis
This program performs principal components analysis on the given dataset. It
will transform the data onto its principal components, optionally performing
dimensionality reduction by ignoring the principal components with the
smallest eigenvalues.
Required options:
--input_file [string] Input dataset to perform PCA on.
--output_file [string] Matrix to save modified dataset to.
Options:
--help (-h) Default help info.
--info [string] Get help on a specific module or option.
Default value ''.
--new_dimensionality [int] Desired dimensionality of output dataset.
Default value 0.
--verbose (-v) Display informational messages and the full
list of parameters and timers at the end of
execution.
The mlpack::IO documentation can be consulted for further and complete documentation. Also useful is to look at other example bindings, found in src/mlpack/methods/
.