ML-CPP: Machine Learning Library with C++ Backend
Thumbnail image credit: mikemacmarketing, Wikimedia Commons
ML-CPP is a high-performance machine learning library that combines the computational efficiency of C++ with the user-friendly interface of Python. The project demonstrates how to bridge these two worlds effectively, creating a machine learning framework that’s both powerful and easy to use.
Key Highlights
- Designed and implemented a modular C++ machine learning library with Python bindings
- Successfully engineered two core algorithms:
- Linear regression with gradient descent optimization
- Neural network with customizable architecture and multiple activation functions
- Achieved 95.67% accuracy on binary classification tasks, comparable to scikit-learn’s 96.33%
- Demonstrated the viability of C++ backend for machine learning applications
Technical Features
Linear Regression Implementation
- Gradient descent optimization with configurable learning rate and convergence criteria
- Full vector/matrix support for multivariate regression
- Comprehensive evaluation metrics including MSE and R-squared
Neural Network Implementation
- Fully customizable architecture with support for any number of hidden layers
- Multiple activation functions (Sigmoid, ReLU, and Tanh)
- Mini-batch stochastic gradient descent optimization
- Xavier/Glorot weight initialization for improved training stability
Performance Analysis
- Neural Network Accuracy: 95.67% accuracy (compared to scikit-learn’s 96.33%)
- Activation Functions: Tanh achieved 100% accuracy on XOR problem, while Sigmoid and ReLU reached 75%
- Training Speed: While ~40x slower than scikit-learn’s optimized implementation on large datasets, the implementation is efficient for small to medium datasets
- Memory Usage: Lower memory footprint than equivalent scikit-learn models
Implementation Highlights
The project demonstrates several advanced techniques:
- C++/Python Interoperability: Seamless integration between C++ and Python using pybind11
- Vector Mathematics: Custom implementation of linear algebra operations for machine learning
- Gradient Descent Optimization: Implementation of both full-batch and mini-batch versions
- Backpropagation Algorithm: From-scratch implementation for neural network training
Code Example
// Create a neural network with 2 inputs, 8 hidden neurons, and 1 output
mlcpp::models::NeuralNetwork model(
{2, 8, 1},
mlcpp::models::ActivationFunction::TANH,
0.01, // learning rate
1000 // max iterations
);
// Train the model
model.fit(X_train, y_train);
// Make predictions
std::vector<double> predictions = model.predictBinary(X_test);
Technologies Used
- C++14: Core implementation language
- Python 3.x: Interface language and comparison testing
- pybind11: For C++/Python bindings
- CMake: Build system
- scikit-learn: For benchmarking and comparison
Learnings and Future Directions
The project demonstrated both the advantages and challenges of implementing machine learning algorithms in C++:
- C++ provides fine-grained control over memory and computation
- The implementation gap between scikit-learn and our library highlights the extensive optimization in mature ML libraries
- Future work could include SIMD optimizations, GPU acceleration, and additional algorithms