*Result*: A novel multi-estimator framework for sequential forward feature selection in cancer prediction.

Title:
A novel multi-estimator framework for sequential forward feature selection in cancer prediction.
Authors:
Misra, Puneet1 (AUTHOR) misra_puneet@lkouniv.ac.in, Yadav, Arun Singh1 (AUTHOR) arun.ai.lkouniv@gmail.com
Source:
AIP Conference Proceedings. 2026, Vol. 3398 Issue 1, p1-10. 10p.
Database:
Academic Search Index

*Further Information*

*Cancer continues to be a major global health challenge, standing as one of the leading causes of death around the world. Early and accurate prediction is essential for effective diagnosis and treatment, ultimately improving patient survival rates. In recent years, machine learning techniques have emerged as powerful tools for disease prediction. Cancer prediction is challenging due to the disease's complexity and heterogeneity, highlighting the need for effective feature selection methods. This process is essential for enhancing model performance by reducing dimensionality and improving interpretability. However, relying on a single estimator for feature selection may not capture all relevant patterns in the data, potentially limiting the model's predictive capabilities. This study proposes a novel multi-estimator framework employing Sequential Forward Feature Selection (SFS) to optimize feature subsets for breast cancer prediction. The proposed method employs multiple estimators like Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree to identify the most significant feature subset for each model. By considering the unique perspectives of different algorithms, the approach aims to enhance overall classification performance. The framework utilizes the Wisconsin Breast Cancer Dataset (WBCD) from the UCI Machine Learning Repository. The WBCD is highly imbalanced; hence, the Synthetic Minority Over Sampling Technique (SMOTE) was applied for class balancing, followed by standard scaling for normalization. Experimental results clearly demonstrate that SFS with LR, NB, SVM, and DT methods selected 9,10,15 and 18 features to find optimal solution. Although SVM had the highest accuracy i.e. 97.90%, LR with 97.72% accuracy was preferred for its simplicity, interpretability, and efficient use of fewer features. Applying the selected features, the optimized LR model achieved an accuracy of 97.90%, precision of 97.33%, recall of 98.60%, and an AUC of 99.85%, outperforming NB and closely rivaling the complex SVM model. Additionally, an ANN attained slightly higher accuracy at 98.60%, but the LR model offered comparable performance with significant advantages in simplicity and computational efficiency. The findings definitively demonstrate that the proposed multi-estimator framework substantially enhances the classification performance of simpler models through effective feature selection. By rigorously assessing the importance of variables across various algorithms, the study clearly establishes that even basic models, like logistic regression (LR), can achieve impressive predictive accuracy, making them formidable alternatives to complex models in breast cancer prediction. [ABSTRACT FROM AUTHOR]*