Explainable Vision-Language Framework for Automated Lung Nodule Risk Stratification Using Dual-Attention Segmentation and Large Medical Models

Jose Fleming; Shaozhou Cai

Authors

Jose Fleming Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.
Shaozhou Cai Department of Computer Science, University of North Texas, Denton, TX, USA.

Keywords:

Explainable artificial intelligence, vision‑language model, lung nodule segmentation, dual attention, risk stratification, large medical models, socio‑technical systems, clinical decision support

Abstract

The clinical management of pulmonary nodules detected in low‑dose computed tomography scans relies critically on accurate risk stratification to distinguish benign from malignant lesions while minimizing unnecessary invasive procedures. Existing deep learning approaches often operate as opaque classifiers, offering little insight into the visual and semantic rationale behind their predictions. This paper introduces an explainable vision‑language framework that integrates a dual‑attention segmentation backbone with large medical vision‑language models to automate lung nodule risk assessment. The proposed architecture first isolates nodule regions through a path‑aggregation encoder combined with channel‑wise and spatial attention mechanisms, producing high‑fidelity segmentation masks that are subsequently analyzed by a multimodal transformer that encodes both radiological features and structured clinical text. A dedicated explainability module generates natural‑language justifications aligned with segmented regions, thereby enabling clinicians to inspect the decision‑making process at both pixel and concept levels. The paper discusses structural trade‑offs between segmentation fidelity, model interpretability, and computational efficiency, and examines deployment considerations including data governance, infrastructure scalability, and regulatory compliance. Fairness and robustness are analyzed across demographic subgroups and imaging acquisition protocols, and policy implications for integrating such systems into existing radiology workflows are explored. By bridging the gap between high‑accuracy black‑box models and the demand for transparent reasoning in high‑stakes medical decisions, the proposed framework advances the state of the art in trustworthy AI for thoracic oncology.

References

1. National Lung Screening Trial Research Team. (2011). Reduced lung‑cancer mortality with low‑dose computed tomographic screening. New England Journal of Medicine, 365(5), 395–409.

2. Ardila, D., Kiraly, A. P., Bharadwaj, S., Choi, B., Reicher, J. J., Peng, L., ... & Shetty, S. (2019). End‑to‑end lung cancer screening with three‑dimensional deep learning on low‑dose chest computed tomography. Nature Medicine, 25(6), 954–961.

3. Zhang, Y., Chen, W., & Xu, Y. (2023). Medical vision‑language pre‑training: A survey. arXiv preprint arXiv:2306.01795.

4. Chen, Y., Li, J., Xiao, Y., Jin, Q., & Shen, L. (2022). Dual attention network for medical image segmentation. Medical Image Analysis, 79, 102456.

5. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

6. Ronneberger, O., Fischer, P., & Brox, T. (2015). U‑Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer‑Assisted Intervention (MICCAI) (pp. 234–241). Springer.

7. Valanarasu, J. M. J., Oza, P., Hacihaliloglu, I., & Patel, V. M. (2021). Medical transformer: Gated axial‑attention for medical image segmentation. In Medical Image Computing and Computer‑Assisted Intervention (MICCAI) (pp. 36–46). Springer.

8. Chang, C., Fu, M., Chen, X., Feng, S., Zhang, M., Zhou, X., ... & Liu, Z. (2025, November). Research on PDU-Net Lung Nodule Segmentation Algorithm Based on Path Aggregation and Dual Attention. In 2025 4th International Conference on Image Processing, Computer Vision and Machine Learning (ICICML) (pp. 1897-1900). IEEE.

9. Zhang, S., Xu, Y., Usuyama, N., Bagal, N., Tanno, R., Preston, S., ... & Poon, H. (2023). BiomedCLIP: A multimodal biomedical foundation model pretrained from curated multimodal datasets. arXiv preprint arXiv:2312.04725.

10. Hicks, S. A., Riegler, M. A., Soguero‑Ruiz, C., & Halvorsen, P. (2021). On the use of attention in deep learning for medical image analysis. Journal of Imaging, 7(8), 142.

11. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad‑CAM: Visual explanations from deep networks via gradient‑based localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 618–626).

12. Petersen, E., Feragen, A., Lassen, M. L., & Nielsen, M. (2022). Demographic bias in lung nodule segmentation models: A multi‑center study. In Medical Imaging with Deep Learning (MIDL) (pp. 1–12).

13. Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I. S. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287–1289.

14. Wang, Y. (2025, April). Efficient adverse event forecasting in clinical trials via transformer-augmented survival analysis. In Proceedings of the 2025 International Symposium on Bioinformatics and Computational Biology (pp. 92-97).

15. Rajpurkar, P., Irvin, J., Ball, R. L., Zhu, K., Yang, B., Mehta, H., ... & Lungren, M. P. (2018). CheXNet: Radiologist‑level pneumonia detection on chest X‑rays with deep learning. arXiv preprint arXiv:1711.05225.

16. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS) (pp. 5998–6008).

17. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144).

18. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (NeurIPS) (pp. 4765–4774).

19. dos Santos, M. P., Berriel, R. F., Lazzaretti, A. E., & Badue, C. (2022). Deep learning for lung nodule detection and classification: A survey. Computers in Biology and Medicine, 145, 105470.

20. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.

21. U.S. Food and Drug Administration. (2021). Proposed regulatory framework for modifications to artificial intelligence/machine learning‑based software as a medical device.

22. Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., ... & Chen, Z. (2022). LoRA: Low‑rank adaptation of large language models. In International Conference on Learning Representations (ICLR).

23. Saeed, N., Albarqouni, S., & Navab, N. (2022). Lung nodule segmentation: A survey on deep learning approaches. Medical Image Analysis, 78, 102406.

Explainable Vision-Language Framework for Automated Lung Nodule Risk Stratification Using Dual-Attention Segmentation and Large Medical Models

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission