Cross-Scale Path Aggregation Transformer for Joint Pulmonary Lesion Segmentation and Gene Expression Pattern Prediction in Precision Oncology

Vikram A. Batra; Jan L. Bush

Authors

Vikram A. Batra Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL, USA.
Jan L. Bush Department of Computer Science, Colorado State University, Fort Collins, CO, USA.

Keywords:

cross-scale aggregation, transformer, pulmonary lesion segmentation, gene expression prediction, precision oncology, path aggregation, joint learning

Abstract

Precision oncology increasingly relies on the integration of radiological phenotypes and molecular signatures to guide individualized treatment. While deep learning has achieved remarkable performance in pulmonary lesion segmentation and in predicting gene expression patterns from medical images, these tasks are typically addressed in isolation, leaving unexploited synergistic representations that could enhance both anatomical delineation and molecular characterization. This paper presents a system-level analysis of a novel architecture that jointly performs pulmonary lesion segmentation and gene expression pattern prediction through a cross-scale path aggregation transformer. The design couples a hierarchical vision transformer backbone with a bidirectional path aggregation mechanism that fuses multi-scale feature maps without resorting to simplistic skip connections, enabling simultaneous refinement of fine-grained boundary information and high-level semantic abstractions. A dual-head decoder produces a dense segmentation mask and a vector of predicted expression levels for clinically relevant oncogenes. We examine the structural trade-offs inherent in multi-task training, including gradient interference, loss weighting strategies, and latent representation entanglement, and we articulate how the cross-scale aggregation reduces representation misalignment across tasks. Beyond model architecture, we discuss deployment considerations such as computational footprint, robustness under domain shift, federated learning for privacy-preserving multi-institutional collaboration, and alignment with regulatory frameworks. By situating the technical contribution within a broader socio-technical infrastructure, we address fairness, interpretability, and sustainability requirements. The discussion offers a forward-looking perspective on how tightly coupled imaging–genomic models can be operationalized in clinical workflows while maintaining safety, equity, and governance standards.

References

1. Lambin, P., Rios-Velazquez, E., Leijenaar, R., Carvalho, S., van Stiphout, R. G., Granton, P., ... & Aerts, H. J. (2017). Radiomics: The bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology, 14(12), 749–762.

2. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Springer.

3. Oktay, O., Schlemper, J., Folgoc, L. L., Lee, M., Heinrich, M., Misawa, K., ... & Rueckert, D. (2018). Attention U-Net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999.

4. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012–10022).

5. Chang, C., Fu, M., Chen, X., Feng, S., Zhang, M., Zhou, X., ... & Liu, Z. (2025, November). Research on PDU-Net Lung Nodule Segmentation Algorithm Based on Path Aggregation and Dual Attention. In 2025 4th International Conference on Image Processing, Computer Vision and Machine Learning (ICICML) (pp. 1897-1900). IEEE.

6. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., ... & Zhou, Y. (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306.

7. Wang, Y. (2025, April). Efficient adverse event forecasting in clinical trials via transformer-augmented survival analysis. In Proceedings of the 2025 International Symposium on Bioinformatics and Computational Biology (pp. 92-97).

8. Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8759–8768).

9. Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., ... & Napel, S. (2018). A radiogenomic dataset of non-small cell lung cancer. Scientific Data, 5, 180202.

10. Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7482–7491).

11. Sheller, M. J., Edwards, B., Reina, G. A., Martin, J., Pati, S., Kotrotsou, A., ... & Bakas, S. (2020). Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific Reports, 10(1), 12598.

12. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

13. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 618–626).

14. U.S. Food and Drug Administration. (2019). Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD). Discussion paper.

15. Price, W. N., & Cohen, I. G. (2019). Privacy in the age of medical big data. Nature Medicine, 25(1), 37–43.

16. Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International Conference on Learning Representations.

17. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3645–3650).

18. Setio, A. A. A., Traverso, A., de Bel, T., Berens, M. S., Bogaard, C. v. d., Cerello, P., ... & van Ginneken, B. (2017). Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Medical Image Analysis, 42, 1–13.

Cross-Scale Path Aggregation Transformer for Joint Pulmonary Lesion Segmentation and Gene Expression Pattern Prediction in Precision Oncology

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission