Hyperspherical Description for Target Datasets with Applications in Novelty Detection and Classification (SVDD)

Resource Overview

Implementation of hyperspherical description for target datasets enabling novelty detection and classification through Support Vector Data Description (SVDD)

Detailed Documentation

Support Vector Data Description (SVDD) is a powerful data description method that constructs a minimum-volume hypersphere in feature space to encapsulate target datasets. Unlike traditional classification methods, SVDD belongs to one-class classification techniques, making it particularly suitable for anomaly detection and novel class identification scenarios. The core implementation involves solving a quadratic optimization problem using Lagrange multipliers to determine the optimal sphere center and radius. A key advantage of SVDD lies in its ability to handle non-linearly separable data through kernel tricks. The algorithm solves an optimization problem that minimizes hypersphere volume while maintaining most target data points within the sphere boundary. Points falling outside the hypersphere boundary are classified as anomalies or novel instances. In Python implementations, this typically involves using scikit-learn's elliptic envelope or custom SVDD classes with kernel functions like RBF (Radial Basis Function). For multi-class datasets, SVDD employs a "one-vs-rest" strategy where independent SVDD models are built for each class. This approach enables simultaneous characterization of multiple class features and constructs dedicated decision boundaries for each category, significantly enhancing flexibility in multi-class classification tasks. The implementation requires training separate SVDD classifiers for each class and aggregating their predictions during inference. In practical applications, parameter selection (particularly kernel function type and slack variables) critically impacts model performance. The Gaussian kernel is frequently adopted due to its excellent mathematical properties, while slack variables (controlled by the nu parameter) regulate the model's sensitivity to outliers. Typical code implementation includes kernel parameter tuning through grid search and cross-validation to optimize sphere tightness and generalization capability.