Abstract
Inflammatory Bowel Disease (IBD) is commonly assessed through endoscopy, but manual interpretation suffers from interobserver variability and limited scalability. Deep learning models offer a path toward standardizing evaluations, yet their effectiveness is constrained by limited task-specific data and the complexity of video-based scoring. Transfer learning from large image datasets like ImageNet is often used to address data scarcity, but such general-purpose features may not align well with medical imagery. This paper investigates the effectiveness of domain specific pretraining for endoscopic video classification under weak supervision. We apply a multiple instance learning (MIL) framework to classify inflammation status from endoscopy videos using a range of deep learning architectures pretrained on either ImageNet, general medical images, or the domain-specific GastroNet5M dataset. Our findings show that models pretrained on endoscopy-specific data consistently outperform general-purpose models across both internal and external datasets, achieving superior F1 Score and AUC values. These results highlight the importance of domain-aligned feature representations and weakly supervised learning strategies in medical video analysis.
| Original language | English |
|---|---|
| Publication status | Accepted/In press - 25 Jul 2025 |
| Event | 32nd International Conference on Neural Information Processing - Okinawa Institute of Science and Technology, Okinawa, Japan Duration: 20 Nov 2025 → 25 Nov 2025 https://iconip2025.apnns.org/ |
Conference
| Conference | 32nd International Conference on Neural Information Processing |
|---|---|
| Abbreviated title | ICONIP 2025 |
| Country/Territory | Japan |
| City | Okinawa |
| Period | 20/11/25 → 25/11/25 |
| Internet address |