Physics-Guided Equivariant Neural Networks for Protein Microenvironment Modeling and pKa Prediction

Authors

  • Beorge Eergusan Department of Computer Science, Binghamton University, Binghamton, NY, USA.
  • Mailk Kalkarna School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
  • Logan Vega Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.
  • Kenneth Riley Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA.

Keywords:

Equivariant neural networks, protein pKa prediction, microenvironment modeling, deep learning, biophysics, SE(3) equivariance, physics-guided machine learning

Abstract

Predicting the pKa values of ionizable residues in proteins is a fundamental challenge in biophysics, with profound implications for drug design, enzyme engineering, and understanding biological mechanisms. The accuracy of pKa prediction depends critically on the faithful representation of the protein microenvironment, a three-dimensional chemical context characterized by electrostatic fields, hydrogen-bonding networks, and solvent exposure. Recent advances in geometric deep learning, particularly equivariant neural networks that respect the symmetries of physical space, offer a promising avenue for learning protein microenvironments directly from atomic coordinates. This paper presents a system-level analysis of physics-guided equivariant neural networks for microenvironment modeling and pKa prediction. We examine the architectural integration of SE(3)-equivariant message passing with physically inspired features such as continuum electrostatics and solvation free energies, discussing trade-offs between model expressiveness, interpretability, and computational efficiency. Beyond algorithmic design, we address the broader infrastructure required for training and deploying these models, including data curation from structural databases, high-performance computing considerations, and robustness to conformational variability. Furthermore, we explore governance and policy dimensions: the imperative for standardized benchmarks, fairness across diverse protein families, reproducibility through open-source release, and the ethical implications of precisely predicting protein properties that could be misused. We also consider sustainability metrics and the potential of amortized inference to mitigate the carbon footprint of large-scale training. By connecting architectural innovation with system-level thinking, this work charts a roadmap for responsible development of physics-guided equivariant models that can transform molecular therapeutics while adhering to principles of equity and scientific rigor.

References

1. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M., & Jensen, J. H. (2011). PROPKA3: Consistent treatment of internal and surface residues in empirical pKa predictions. Journal of Chemical Theory and Computation, 7(2), 525–537.

2. Anandakrishnan, R., Aguilar, B., & Onufriev, A. V. (2012). H++ 3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Research, 40(W1), W537–W541.

3. Torng, W., & Altman, R. B. (2017). 3D deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinformatics, 18(1), 302.

4. Fout, A., Byrd, J., Shariat, B., & Ben-Hur, A. (2017). Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems, 30, 6530–6539.

5. Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., & Riley, P. (2018). Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219.

6. Fuchs, F., Worrall, D., Fischer, V., & Welling, M. (2020). SE(3)-transformers: 3D roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33, 1970–1981.

7. Batzner, S., Musaelian, A., Sun, L., Geiger, M., Mailoa, J. P., Kornbluth, M., Molinari, N., Smidt, T. E., & Kozinsky, B. (2022). E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Communications, 13, 2453.

8. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., ... Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589.

9. Shang, Y., Li, H., Zhang, H., & Zhou, H. (2022). DeepKa: A deep-learning-based protein pKa prediction model. Journal of Chemical Information and Modeling, 62(11), 2719–2730.

10. Pahari, S., Sun, L., & Alexov, E. (2019). PKAD: a database of experimentally measured pKa values of protein ionizable residues. Bioinformatics, 35(17), 3189–3191.

11. Dolinsky, T. J., Nielsen, J. E., McCammon, J. A., & Baker, N. A. (2004). PDB2PQR: an automated pipeline for the setup of Poisson–Boltzmann electrostatics calculations. Nucleic Acids Research, 32(Web Server), W665–W667.

12. Radak, B. K., Chipot, C., Suh, D., Jo, S., Jiang, W., Phillips, J. C., Schulten, K., & Roux, B. (2017). Constant-pH molecular dynamics simulations for large biomolecular systems. Journal of Chemical Theory and Computation, 13(12), 5933–5944.

13. Walsh, I., Pollastri, G., & Tosatto, S. C. E. (2021). DOME: recommendations for supervised machine learning validation in biology. Nature Methods, 18, 1122–1127.

14. Song, Y., Mao, J., & Gunner, M. R. (2009). MCCE2: improving protein pKa calculations with extensive side chain rotamer sampling. Journal of Computational Chemistry, 30(14), 2231–2247.

15. Schiffer, S., Ho, J., Grossfield, A., & Mobley, D. L. (2022). SAMPL8 pKa challenge: assessing computational methods for protein pKa prediction. Journal of Computer-Aided Molecular Design, 36, 459–478.

16. Song, Z., Wang, R., Jiao, X., & Huang, Z. (2026). Graph-Based Deep Learning Models for Predicting p K a Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering. Journal of Chemical Information and Modeling.

17. Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118.

18. Zhang, Z., Xu, M., Gabizo, A., & Berger, B. (2023). Protein representation learning by geometric structure pretraining. International Conference on Learning Representations.

19. Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30, 6402–6413.

20. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., ... Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

21. Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533, 452–454.

22. Urbina, F., Lentzos, F., Invernizzi, C., & Ekins, S. (2022). Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4, 189–191.

23. Chen, I. Y., Pierson, E., Rose, S., Joshi, S., Ferryman, K., & Ghassemi, M. (2021). Ethical machine learning in healthcare. Annual Review of Biomedical Data Science, 4, 123–144.

24. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650.

25. Noé, F., Olsson, S., Köhler, J., & Wu, H. (2019). Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457), eaaw1147.

Downloads

Published

2026-06-15

How to Cite

Beorge Eergusan, Mailk Kalkarna, Logan Vega, & Kenneth Riley. (2026). Physics-Guided Equivariant Neural Networks for Protein Microenvironment Modeling and pKa Prediction. Bioinformatics Insights and Analytics, 1(1). Retrieved from https://bioinfia.org/index.php/home/article/view/145