Generative Masking and Asymmetric Hash Learning for Large-Scale Physiological Signal Indexing in Digital Healthcare Systems
Keywords:
generative masking, asymmetric deep hashing, physiological signal indexing, digital healthcare systems, retrieval infrastructure, fairness, sustainabilityAbstract
The exponential growth of ambulatory physiological monitoring has given rise to vast repositories of electrocardiogram, photoplethysmogram, and other biosignal waveforms, placing unprecedented pressure on retrieval and indexing infrastructure in digital healthcare systems. Traditional indexing approaches struggle to preserve clinically meaningful similarity relationships at scale while maintaining low-latency query performance. This paper addresses that gap by proposing a unified system architecture that couples generative masking with asymmetric hash learning for large-scale physiological signal indexing. Generative masking, informed by statistical priors over physiological dynamics, produces compact latent representations that suppress noise while amplifying diagnostically salient morphology. These representations are then mapped into binary hash codes through an asymmetric learning scheme where the query and database sides are allowed to follow distinct encoding pathways, and self-supervised semantic excavation aligns hash distances with hidden functional similarity. We describe the end-to-end system stack spanning edge preprocessing, cloud-based hash indexing, and federated governance layers, and we analyze how the interplay between masking and hashing resolves key structural trade-offs among retrieval precision, storage efficiency, and inference latency. Beyond technical performance, the paper examines fairness implications arising from population-specific masking priors, robustness to distributional drift in wearable sensor streams, sustainability considerations connected to model compression and edge deployment, and regulatory aspects of privacy-preserving similarity search under evolving health data protection frameworks. The discussion foregrounds infrastructural design principles that can guide the integration of generative and hashing components into future digital health platforms, ensuring that large-scale physiological signal indexing remains clinically trustworthy, equitable, and operationally sustainable.
References
1. Topol, E. J. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.
2. Hannun, A. Y., Rajpurkar, P., Haghpanahi, M., Tison, G. H., Bourn, C., Turakhia, M. P., & Ng, A. Y. (2019). Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine, 25(1), 65–69.
3. Xu, J., Li, Z., Huang, Q., & Yang, Y. (2019). Deep semantic hashing for fast large-scale medical image retrieval. IEEE Access, 7, 109775–109785.
4. Eldele, E., Ragab, M., Chen, Z., Wu, M., Kwoh, C. K., Li, X., & Guan, C. (2021). Time-series representation learning via temporal and contextual contrasting. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 5(3), 1–24.
5. Zhu, F., Ye, F., Fu, Y., Chen, L., & Li, J. (2020). Generating realistic electrocardiogram signals with a generative adversarial network. Applied Sciences, 10(12), 4348.
6. Sarkar, P., & Etemad, A. (2020). Self-supervised ECG representation learning for emotion recognition. IEEE Transactions on Affective Computing, 13(3), 1541–1554.
7. Yu, Z., Wu, S., Dou, Z., & Bakker, E. M. (2022). Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint. Neurocomputing, 483, 87-104.
8. Li, Y., Zhang, J., Wang, Y., & Liu, C. (2022). Transformer-based deep learning approaches for physiological signal analysis: A review. IEEE Reviews in Biomedical Engineering, 16, 232–250.
9. Mehari, T., & Strodthoff, N. (2022). Self-supervised representation learning from 12-lead ECG data. Computers in Biology and Medicine, 141, 105114.
10. Liu, Y., Zhang, J., & Huang, Q. (2022). Privacy-preserving deep hashing for medical image retrieval. IEEE Transactions on Information Forensics and Security, 17, 1890–1903.
11. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3, 119.
12. Guo, Z., Chen, T., Jiao, Y., Pan, Y., Hu, X., & Ferrario, M. (2026). SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model. arXiv preprint arXiv:2601.21031.
13. Chen, I. Y., Pierson, E., Rose, S., Joshi, S., Ferryman, K., & Ghassemi, M. (2021). Ethical machine learning in healthcare. Annual Review of Biomedical Data Science, 4, 123–144.
14. Pfohl, S., Zhang, H., Xu, Y., Foryciarz, A., Ghassemi, M., & Shah, N. H. (2022). A holistic approach to algorithmic fairness in healthcare. Journal of the American Medical Informatics Association, 29(7), 1193–1201.
15. Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., ... & Dean, J. (2021). Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350.
16. Lomonaco, V., Pellegrini, L., Cossu, A., Carta, A., Graffieti, G., Hayes, T. L., ... & Maltoni, D. (2021). Continual learning for medical applications: A survey. Artificial Intelligence in Medicine, 119, 102166.
17. Henderson, P., Hu, J., Romoff, J., Brunskill, E., Jurafsky, D., & Pineau, J. (2020). Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research, 21(248), 1–43.
18. Gerke, S., Minssen, T., & Cohen, G. (2020). Ethical and legal challenges of artificial intelligence-driven healthcare. In Artificial Intelligence in Healthcare (pp. 295–336). Academic Press.
19. Bender, D., & Sartipi, K. (2013). HL7 FHIR: An agile and RESTful approach to healthcare information exchange. Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, 326–331.
20. Chen, R., Lu, M., Chen, T., Williamson, D. F. K., & Mahmood, F. (2021). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5(6), 493–497.
21. Yue, Y., Khanal, A., Lyu, T., Weissman, S., & Liang, C. (2025, May). EHR Phenotyping Methods for Measuring Treatment Adherence Among People Living With HIV in All of Us: Towards Disparities and Inequalities in HIV Care Continuum. In AMIA Annual Symposium Proceedings (Vol. 2024, p. 1294).
22. Shui, Y., Jin, R., Dou, Z., & Gao, Z. (2026). ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split Learning. arXiv preprint arXiv:2604.03595.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Bioinformatics Insights and Analytics

This work is licensed under a Creative Commons Attribution 4.0 International License.



