Multiple Kernel Learning: A Linear Combination Model for Kernel Fusion with Computational Efficiency Enhancements

Resource Overview

Multiple kernel learning integrates multiple kernels through linear combination, often hindered by slow optimization due to explicit kernel computations. We propose explicit approximation of kernel mapping functions in finite-dimensional spaces, employing dual coordinate descent for SVM optimization with group Lasso regularization for kernel weights.

Detailed Documentation

In this text, we describe a multiple kernel learning model that merges multiple kernels via linear combination. However, conventional solutions are typically slow due to explicit kernel computations. To address this, we introduce a method to explicitly approximate the kernel mapping function in finite-dimensional spaces. We then solve the SVM using dual coordinate descent, storing solutions in the primal problem to avoid explicit kernel calculations. During the alternating SVM optimization, we implement group Lasso regularization for kernel weights—a technique that promotes sparsity by penalizing the L1-norm of kernel weight groups, effectively automating kernel selection. This work originated as a byproduct of our research collaboration with Dr. Ye Yiren and Professor Wang from Academia Sinica. Specifically, our research explores several aspects of multiple kernel learning models. For instance, we investigate how to select diverse kernel types (e.g., linear, polynomial, RBF) and automatically determine optimal kernel combinations through algorithms like cross-validation or Bayesian optimization. We also examine the scalability of multiple kernel learning on large datasets, proposing efficient computational strategies such as kernel approximation using random Fourier features or Nyström methods to reduce time complexity from O(n²) to O(n). Additionally, we apply the model to domains like image recognition (using histogram intersection kernels for feature fusion) and natural language processing (e.g., combining syntactic and semantic kernels). In summary, our research highlights the significance of multiple kernel learning in machine learning and introduces improvements for efficiency and adaptability. We hope these contributions provide practical guidance and inspiration for future research and applications, with potential code implementations leveraging libraries like scikit-learn’s `MKL` or custom solvers using stochastic gradient descent for large-scale data.