Privacy-Preserving Medical Knowledge Retrieval with Self-Supervised Hash Learning and Adversarial Defense for Healthcare AI Agents

Authors

  • Zhoukai Xue School of Computing, Clemson University, Clemson, SC, USA.
  • Troy Fields Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA.
  • Ferry Hansson Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA.
  • Dean Wabb Department of Computer Science, University of North Texas, Denton, TX, USA.

Keywords:

Privacy-preserving retrieval; self-supervised hashing; adversarial defense; healthcare AI agents; medical knowledge graphs; secure multi-party computation

Abstract

The rapid integration of artificial intelligence agents into clinical decision support systems has intensified the demand for medical knowledge retrieval pipelines that simultaneously ensure high-fidelity semantic access, rigorous privacy protection, and resilience against adversarial manipulation. This paper presents a system-level architectural investigation into privacy-preserving medical knowledge retrieval that unifies self-supervised hash learning with layered adversarial defense mechanisms. By mapping medical knowledge fragments into compact binary codes through self-supervised contrastive objectives, the framework enables efficient approximate nearest-neighbor search over distributed repositories without exposing sensitive clinical content. The proposed architecture couples a differentially private hash encoder with secure index structures and an adversarial sanitization module that monitors query integrity and defends against both input-space perturbations and knowledge-poisoning attacks. We examine structural trade-offs between hash code length, retrieval precision, computational latency, and achievable privacy guarantees, drawing on cross-domain insights from cryptographically secure computation, federated learning, and adversarial robustness literature. Deployment considerations are analyzed within the context of hospital information ecosystems and cross-border regulatory regimes, addressing governance challenges such as algorithmic auditing, fairness across heterogeneous patient populations, and the sustainability of large-scale hash-based retrieval infrastructure. The study further explores forward-looking policy implications for certifying autonomous healthcare AI agents that rely on privacy-preserving retrieval as a core cognitive operation. The analysis demonstrates that self-supervised hashing constitutes a promising foundation for trust-enhancing knowledge access, yet requires careful co-design with adversarial defense, governance frameworks, and lifecycle management to withstand emerging threat surfaces in medical decision-making environments.

References

1. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. npj Digital Medicine, 3, 119.

2. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–318.

3. Yu, Z., Wu, S., Dou, Z., & Bakker, E. M. (2022). Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint. Neurocomputing, 483, 87-104.

4. Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2018). HotFlip: White-box adversarial examples for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), 31–36.

5. Jin, D., Jin, Z., Zhou, J. T., & Szolovits, P. (2020). Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8018-8025.

6. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

7. Wong, W. K., Cheung, D. W., Kao, B., & Mamoulis, N. (2009). Secure kNN computation on encrypted databases. Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, 139-152.

8. Finlayson, S. G., Bowers, J. D., Kohane, I. S., & Beam, A. L. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287-1289.

9. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning, 1597-1607.

10. Zhang, H., Zhang, J., Lu, G., & Zhang, D. (2021). Asymmetric deep hashing for large-scale histopathological image retrieval. Computers in Biology and Medicine, 137, 104809.

11. Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, 265-284.

12. He, J., Baxter, S. L., Xu, J., Xu, J., Zhou, X., & Zhang, K. (2019). The practical implementation of artificial intelligence technologies in medicine. Nature Medicine, 25(1), 30-36.

13. Mohassel, P., & Zhang, Y. (2017). SecureML: A system for scalable privacy-preserving machine learning. Proceedings of the 38th IEEE Symposium on Security and Privacy, 19-38.

14. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50-60.

15. Shen, D., Su, Q., Chapfuwa, P., Wang, W., Wang, G., Henao, R., & Carin, L. (2018). NASH: Toward end-to-end neural architecture for generative semantic hashing. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 795-805.

16. Yang, E., Liu, T., Deng, C., & Tao, D. (2020). Adversarial examples for image retrieval. IEEE Transactions on Image Processing, 29, 7565-7577.

17. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.

18. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.

19. Morley, J., Machado, C. C. V., Burr, C., Cowls, J., Joshi, I., Taddeo, M., & Floridi, L. (2020). The ethics of AI in health care: A mapping review. Social Science & Medicine, 260, 113172.

20. Cao, N., Wang, C., Li, M., Ren, K., & Lou, W. (2014). Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Transactions on Parallel and Distributed Systems, 25(1), 222-233.

21. Alsentzer, E., Murphy, J. R., Boag, W., Weng, W. H., Jindi, D., Naumann, T., & McDermott, M. B. A. (2019). Publicly available clinical BERT embeddings. Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72-78.

22. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645-3650.

23. Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., & Swami, A. (2016). The limitations of deep learning in adversarial settings. Proceedings of the 1st IEEE European Symposium on Security and Privacy, 372-387.

24. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1175-1191.

25. Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S., & Sontag, D. (2017). Learning a health knowledge graph from electronic medical records. Scientific Reports, 7, 5994.

Downloads

Published

2026-06-11

How to Cite

Zhoukai Xue, Troy Fields, Ferry Hansson, & Dean Wabb. (2026). Privacy-Preserving Medical Knowledge Retrieval with Self-Supervised Hash Learning and Adversarial Defense for Healthcare AI Agents. Bioinformatics Insights and Analytics, 1(1). Retrieved from https://bioinfia.org/index.php/home/article/view/138