Cross-Modal Deep Hashing for Medical Image–Text Retrieval via Self-Supervised Asymmetric Semantic Excavation

Ronald Garker; Viktor D. Lindberg; Lars Kelley

Authors

Ronald Garker Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA.
Viktor D. Lindberg Department of Computer Science, Binghamton University, Binghamton, NY, USA.
Lars Kelley Department of Computer Science, University of New Hampshire, Durham, NH, USA.

Keywords:

cross-modal retrieval; deep hashing; self-supervised learning; medical image–text; asymmetric semantic excavation; healthcare infrastructure; fairness; policy

Abstract

The integration of medical imaging and unstructured clinical narratives has created a multimodal data landscape that promises transformative diagnostic and research capabilities, yet the sheer scale, heterogeneity, and privacy sensitivity of this data challenge conventional retrieval systems. This paper presents a systems-level analysis of cross-modal deep hashing frameworks that exploit self-supervised asymmetric semantic excavation to enable efficient medical image–text retrieval. We depart from traditional symmetric learning paradigms and examine how asymmetric network architectures, combined with margin-scalable semantic constraints, can excavate latent correspondences from unlabeled radiological archives without reliance on costly manual annotations. The discussion extends beyond algorithmic novelty to encompass structural trade-offs in system architecture, including the design of modality-specific encoders, hash code binarization pipelines, and distributed retrieval topologies suitable for hospital information systems. We scrutinize the interplay between retrieval precision and computational efficiency, highlighting how binary hash codes can reduce storage footprints by orders of magnitude while enabling sublinear nearest-neighbor search in high-dimensional joint embedding spaces. Critical attention is devoted to robustness under domain shift caused by varied imaging equipment and heterogeneous reporting styles, as well as to fairness concerns that arise when retrieval performance varies across demographic subgroups, a matter of acute importance in clinical decision support. Furthermore, we address infrastructure governance, sustainability of deep hashing model lifecycles, and the policy implications of deploying self-supervised retrieval tools within regulated healthcare environments. By situating cross-modal deep hashing within a broad socio-technical framework, the paper argues that self-supervised asymmetric semantic excavation offers a viable trajectory toward scalable, interpretable, and ethically grounded medical information access, provided that system design accounts for clinical workflows, regulatory compliance, and long-term maintainability.

References

1. Wang, K., Yin, Q., Yang, Y., & Wang, W. (2016). A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215.

2. Jiang, Q.-Y., & Li, W.-J. (2017). Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3232–3240).

3. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning (pp. 1597–1607).

4. Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240.

5. Shen, F., Xu, Y., Liu, L., Yang, Y., & Shen, H. T. (2019). Asymmetric deep cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering, 32(3), 507–519.

6. Cao, Y., Long, M., Wang, J., & Liu, S. (2018). Deep Cauchy hashing for hamming space retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1229–1237).

7. Hu, D., Nie, F., & Li, X. (2019). Deep binary reconstruction for cross-modal hashing. IEEE Transactions on Multimedia, 21(4), 973–985.

8. Lin, Z., Ding, G., Hu, M., & Wang, J. (2016). Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3864–3872).

9. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., van der Laak, J. A. W. M., van Ginneken, B., & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88.

10. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.

11. Yu, Z., Wu, S., Dou, Z., & Bakker, E. M. (2022). Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint. Neurocomputing, 483, 87-104.

12. Shivade, C., Raghavan, P., Fosler-Lussier, E., Embi, P. J., Elhadad, N., Johnson, S. B., & Lai, A. M. (2014). A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association, 21(2), 221–230.

13. Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128.

14. Rieke, N., Hancox, J., Li, W., Milletarì, F., Roth, H. R., Albarqouni, S., Bakas, S., Galtier, M. N., Landman, B. A., Maier-Hein, K., Ourselin, S., Sheller, M., Summers, R. M., Trask, A., Xu, D., Baust, M., & Cardoso, M. J. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 1–7.

15. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3645–3650).

16. Dou, Q., Coelho de Castro, D., Kamnitsas, K., & Glocker, B. (2019). Domain generalization via model-agnostic learning of semantic features. In Advances in Neural Information Processing Systems (Vol. 32).

17. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

18. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 220–229).

19. U.S. Food and Drug Administration. (2021). Proposed regulatory framework for modifications to artificial intelligence/machine learning-based software as a medical device. Discussion Paper.

20. European Commission. (2021). Proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.

21. Mandl, K. D., & Kohane, I. S. (2015). New federal patient privacy rules—HIPAA 2.0. New England Journal of Medicine, 373(2), 109–111.

22. Zhu, L., Liu, Z., & Han, S. (2019). Deep leakage from gradients. In Advances in Neural Information Processing Systems (Vol. 32).

23. Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2916–2929.

Cross-Modal Deep Hashing for Medical Image–Text Retrieval via Self-Supervised Asymmetric Semantic Excavation

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission