Classification Experiment Using Logistic Regression

Resource Overview

Classification experiment based on logistic regression, tested on the UCI Adult dataset with implementation details and performance evaluation.

Detailed Documentation

We conducted a classification experiment using logistic regression, testing it on the UCI Adult dataset. This dataset contains personal information obtained from the US Census Bureau, including features such as age, education level, marital status, and occupation. Our goal was to use these features to predict whether an individual's income exceeds $50K per year. We began with exploratory data analysis to understand feature distributions and correlations, followed by preprocessing steps like handling missing values and encoding categorical variables. The logistic regression model was implemented using optimization algorithms like gradient descent or Newton-Raphson method, with the sigmoid function serving as the core component for probability estimation. We trained and tested the model using k-fold cross-validation to evaluate performance metrics such as accuracy, precision, and recall. Model optimization involved hyperparameter tuning (e.g., regularization strength) and feature selection to improve prediction accuracy. Finally, we found that the logistic regression model performed excellently on the Adult dataset, achieving strong predictive results with clear interpretability of feature importance through coefficient analysis.