Evolutionary and Population-Aware AI Models for Characterizing Global Diversity of Polymorphic Immune Genes Across Human Populations

Jean Gregory; Varun C. Chopra; Baurav Mhuja

Authors

Jean Gregory School of Computing, Clemson University, Clemson, SC, USA.
Varun C. Chopra Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.
Baurav Mhuja Department of Computer Science, University of Central Florida, Orlando, FL, USA.

Keywords:

evolutionary artificial intelligence, population genomics, immune gene diversity, large-scale systems, data governance, algorithmic fairness, genomic infrastructure

Abstract

Polymorphic immune genes, particularly the human leukocyte antigen system, exhibit extraordinary diversity across global human populations, shaped by millions of years of evolutionary pressure from pathogens, environmental factors, and demographic history. This diversity underpins critical differences in disease susceptibility, vaccine response, and transplantation compatibility, yet existing computational approaches for characterizing these genes are largely built on datasets dominated by individuals of European ancestry and fail to incorporate population structure or evolutionary dynamics. This paper presents a system-level examination of evolutionary and population-aware artificial intelligence models designed to characterize global diversity of polymorphic immune genes across human populations. We propose that such models must integrate principles from population genetics, evolutionary biology, and scalable machine learning architectures to produce robust, generalizable, and equitable typings. The discussion focuses on structural trade-offs in model design, data governance frameworks, computational infrastructure requirements, deployment strategies across diverse settings, sustainability of large-scale inference pipelines, fairness considerations in training and validation, and the policy implications for global genomic equity. We illustrate how explicit encoding of demographic histories and selective sweeps into model representations can reduce bias while improving predictive accuracy for underrepresented populations. The paper further examines the challenges of harmonizing heterogeneous long-read and short-read sequencing data across thousands of samples, the necessity of privacy-preserving architectures for sensitive genetic information, and the broader socio-technical infrastructure needed to support continuous learning from emerging population-level data. A case illustration based on the scalable framework for comprehensive typing from long-read data is used to contextualize these architectural decisions. The paper concludes by outlining a roadmap for future research that aligns technological development with ethical imperatives and global health priorities.

References

1. The 1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526, 68–74.

2. Slatkin, M. (2008). Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics, 9, 477–485.

3. Meyer, D., & Thomson, G. (2001). How selection shapes variation of the human major histocompatibility complex: a review. Annals of Human Genetics, 65(1), 1–26.

4. Trowsdale, J., & Parham, P. (2004). Mini-review: Defense strategies and immunity-related genes. Nature Reviews Immunology, 4, 619–624.

5. Kwon, D., Kim, J., & Youn, J. (2021). Machine learning in immunology. Nature Reviews Immunology, 21, 565–576.

6. Younis, A., Shami, A., & Abdulla, M. (2022). Deep learning for immunology. Trends in Immunology, 43(5), 396–408.

7. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

8. Leek, J. T., & Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. Bioinformatics, 23(22), 3039–3045.

9. Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2), 1–21.

10. Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1), 1–13.

11. Veale, M., & Binns, R. (2017). Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society, 4(2), 1–11.

12. Robinson, J., Barker, D. J., Georgiou, X., Cooper, M. A., Flicek, P., & Marsh, S. G. E. (2020). IPD-IMGT/HLA Database. Nucleic Acids Research, 48(D1), D948–D955.

13. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., ... & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

14. Mell, P., & Grance, T. (2011). The NIST definition of cloud computing. NIST Special Publication 800-145.

15. Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2), 99–127.

16. Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63.

17. Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 3315–3323.

18. Danks, D., & London, A. J. (2017). Algorithmic bias in autonomous systems. Proceedings of the 26th International Joint Conference on Artificial Intelligence, 4691–4697.

19. Nielsen, R., & Slatkin, M. (2013). An Introduction to Population Genetics: Theory and Applications. Sinauer Associates.

Evolutionary and Population-Aware AI Models for Characterizing Global Diversity of Polymorphic Immune Genes Across Human Populations

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission