Gene Classification on DNA Microarrays Using RFE-SVM Algorithm

Resource Overview

Feature selection is performed using the t-test method, followed by classification with the Recursive Feature Elimination-Support Vector Machine (RFE-SVM) algorithm

Detailed Documentation

In this document, we first employ the t-test feature selection method to identify significant features. The t-test implementation involves computing statistical significance between gene expression levels across different sample groups, typically using functions like scipy.stats.ttest_ind() in Python. Features with p-values below a predetermined threshold (e.g., p < 0.05) are selected for subsequent analysis.

Subsequently, we utilize the Recursive Feature Elimination-Support Vector Machine (RFE-SVM) algorithm as our classifier. The RFE-SVM process works by recursively eliminating the least important features based on SVM weights, implemented through sklearn.svm.SVC with linear kernel and RFE from sklearn.feature_selection. This approach helps extract the most relevant features, improving classification performance by identifying features that contribute most significantly to classification outcomes in the dataset.

The methodology enables optimal feature prioritization, where the algorithm iteratively removes features with smallest weight magnitudes while retraining the SVM model, ultimately enhancing classification accuracy through dimensionality reduction and feature importance ranking.