Abstract
Given the increasing complexity of omics datasets, a key challenge is not only improving classification performance but also enhancing the transparency and reliability of model decisions. Effective model performance and feature selection are fundamental for explainability and reliability. In many cases, high-dimensional omics datasets suffer from limited number of samples due to various factors, such as clinical constraints, patient conditions, phenotypes rarity and others. Current omics-based classification models often suffer from narrow interpretability, making it difficult to discern meaningful insights where trust and reproducibility are critical. This study presents a machine learning-based classification framework that integrates feature selection with data augmentation techniques to achieve high-standard classification accuracy while ensuring interpretability. Using the publicly available dataset E-MTAB-8026, we explore a bootstrap analysis in six binary classification scenarios to evaluate the proposed model’s behaviour. Our findings emphasize the fundamental balance between accuracy and feature selection, highlighting the positive effect of introducing synthetic data for better generalization, even in scenarios with very limited sample availability.
| Original language | English |
|---|---|
| Publication status | Published - 17 Jul 2025 |
| Event | 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society: EMBC 25 - Copenhagen, Denmark Duration: 14 Jul 2025 → 17 Jul 2025 https://embc.embs.org/2025/ |
Conference
| Conference | 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society |
|---|---|
| Abbreviated title | EMBC 25 |
| Country/Territory | Denmark |
| City | Copenhagen |
| Period | 14/07/25 → 17/07/25 |
| Internet address |